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ABSTRACT 

The      multi-backend        database      system       (MBDS)  in      the 

laboratory  for  Database  System  Research  at  the  Naval 
Postgraduate        School       is        designed  to        overcome        the 

performance-gain  and  capacity-growth  problems  of  either  the 
traditional  database  system  or  the  single-back  end-software 
database   system.  The   original    MBDS  supported    four   primary 

operations,    namely,    RETRIEVE,    DELETE,    UPDATE   and    INSERT. 

This  thesis  presents  the  design  and  implementation  of 
the  fifth  primary  operation,  the  RETRIEVE-COMMON  operation. 
The  retrieve-common  operation  is  used  to  merge  two  files  by 
their   common  attribute  values.  First,       the  overall   design 

and  inplementation  of  MBDS  is  reviewed.  Then,  several 
alternatives  are  compared  and  analyzed  to  select  the  best 
one  as  our  design  and  implementation  approach.  Finally,  we 
describe  the  detailed  design  and  the  implementation.  Our 
goal  is  to  maximize  the  utilization  and  minimize  the  effects 
to   the  existing    system. 

For        integrating     our        design        into      MBDS,  several 

modifications        are        made.  The        algorithms        for         the 

modifications  and  their  program  specifications  are  also 
provided   in   Chapter    IV,    V   and    Appendices. 
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I.  INTRODUCTION 

A.   TEE  SCOPE  OF  THE  THESIS 

A  database,  is  a  collection  of  stored  operational  lata; 
and  a  database  system  is  a  computer-based  system  whcse 
overall  purpose  is  tc  record  and  maintain  information  (data) 
£Bef.  1  ].  The  traditional  approach  to  manage  the  database 
system  is  to  run  the  database  system  software  as  an 
application  program  in  a  mainframe  computer  system.  The 
database  system  must  share  the  use  and  the  control  of  the 
mainframe  computer  resources  with  all  of  the  otter 
applications  of  the  computer  system.  The  performance  of 
this  approach  suffers  whenever  there  is  an  increase  from 
either  the  usage  of  the  computer  system  or  the  database 
applications. 

One  solution  to  this  problem  is  to  offload  the  database 
system  from  the  mainframe  to  a  single,  dedicated  backend 
computer.  The  backend  computer  has  its  own  disk  storage  and 
used  to  perform  database  operations  exclusively. 
[Eefs.  2,3].  This  approach  is  known  as  the  single  software 
lack  end  approach.  Eatabase  systems  based  on  this  approach 
are  referred  to  as  software  single  backend  database  systems. 
However,  this  approach  still  has  the  disadvantage,  that  is, 
performance  upgrades  will  reguire  the  replacement  of  the 
backend  and  this  may  entail  software  modifications  and 
hardware  disruption  £Eef.  4  :  p.  4]. 

A  second  approach  to  solve  the  database  performance 
problem  is  to  develop  a  special-purpose  database  machine 
with  specially  designed  hardware.  However,  the 
cost-effectiveness  of  this  approach,  known  as  the  hardware 
backend  approach,  has  not  yet  been  demonstrated  [Ref.  5]. 


In  order  to  overcome  the  perioraiance-gain  and 
capacity-growth  problems  of  cither  the  traditional  database 
system  or  the  single  tackend  software  system,  a  research  of 
a  multi-tackend  database  system,  known  as  MBDS,  is  conducted 
in  the  Laboratory  for  DataLase  Systems  Research,  at  the 
Naval      postgraduate    School.  Instead      of      a    single      backend 

computer,  MDDS  uses  several  identical  (both  in  hardware  and 
in  software)  minicomputers  as  its  backend  computers  in  a 
parallel  fashion  in  order  to  gain  performance  gain  and 
capacity  growth.  These  backends  with  their  respective  disk 
systems  are  connected  with  another  minicomputer,  called  the 
tackend      controller.  The    controller      is      responsible      for 

supervising  the  execution  of  parallel  database  operations  on 
the  lackends  and  for  interfacing  with  the  hosts  and  the 
user.  Users  access  the  system  either  by  way  of  the  host  or 
through    the   controller   directly     (as    shown    in    Figure    1.1). 
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Figure    1.1         The    Hulti-Backend   Database    System 


DS) 


10 


The  attribute-based  data  language  (ABDL)  [Ref.  6]  is 
used  as  the  basis  of  the  data  language  of  MBDS.  Currently, 
ABDL  supports  four  primary  database  operations,  RETRIEVE, 
DELETE,  UPDATE  and  INSERT.  The  functions  of  these  four 
database  operations  are  shown  in  Figure  1.2. 


Operation 


Function 


RETRIEVE 


Retrieve  records  from  the  database 


DELETE 


Eelete  records  from  the  database 


OPIATE 


Modify  records  of  the  database 


INSERT 


Insert  records  into  the  database 


Figure  1.2    The  Functions  of  the 
Current  KBDS  Database  Operations. 

In  order  to  make  MBDS  a  more  complete  database  system, 
the  fifth  operation,  the  RETRIEVE-COMMON  operation  which  is 
used  to  merge  two  files  by  common  attribute  values,  has  been 
proposed  [Ref.  7].  This  thesis  will  focus  on  the  design  and 
implementation  of  the  RETRIEVE-COMMON  operation  of  MBDS.  We 
will  propose  several  alternatives  of  the  design  and 
implementation  strategies,  then  evaluate  and  analyze  these 
alternatives  based  on  the  time  complexities,  the  affects  to 
the  existing  system  and  the  design-goals  of  MBDS.  According 
the  results  of  the  analysis,  we  will  choose  the  best 
alternative  to  design  and  implement  the  fifth  operation. 
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B.   TEE  ORGANIZATION  OF  THE  THESIS 

The  rest  of  this  thesis  is  organized  as  follows.  In 
chapter  II  we  give  an  overview  of  the  architecture  of  the 
MBDS.  We  will  describe  the  design  goals,  the  underlying  and 
intended  hardware,  the  process  structure,  the  data  model  and 
the  data  language  of  MBDS.  In  chapter  III,  we  first 
define  the  intended  operation  and  the  syntax  of 
RETRIEVE_COMMON  operation,  and  then  evaluate  and  analyze  the 
alternatives  for  the  design  and  implementation.  According 
to  the  analysis,  we  will  select  the  best  alternative  to  add 
the  retrieve-common  operation  into  the  MBDS.  In  chapter  IV, 
we  present  the  details  of  the  design  for  the  selected 
approach.  We  also  consider  the  possible  effects  of  this 
approach  to  the  existing  system.  In  chapter  V,  we  describe 
how  to  incorporate  our  design  into  MBDS.  Our  goal  is  to 
minimize  the  effects  of  the  implementation.  Finally,  this 
thesis  is  summarized  and  concluded  in  chapter  VI.  It  is 
hoped  that  this  thesis  will  provide  a  definite  help  to  the 
future  work  on  MBDS. 
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II.  THE  MOLTI^BACKEND  DATABASE  SYSTEM  (MBDS) 

In  this  chapter  we  will  briefly  review  the  configuration 
and  the  theory  of  operations  of  the  MBDS.  Most  of  the 
information  provided  in  this  chapter  has  been  extracted  from 
[Refs.  4,7  :  pp.  1-68,  7-20].  The  interested  readers  are 
encouraged  to  refer  to  the  references. 

A.   TBE  SYSTEM  GOALS 

As  mentioned  in  chapter  I,  MBDS  is  designed  to  overcome 
the  performance  problems  and  upgrade  issues  of  the 
traditional  mainframe-based  or  the  software  single-backend 
database  system.  In  ether  words,  the  overall  goal  for  MEDS 
is  to  prove  that: 

(1)  the  system  is  easily  extensible;  and 

(2)  the  performance   gain   and   improvement    should  be 

proportional  to   the  multiplicity   of  processing   and 

storage  elements  [Bef.  4  :  pp. 1-5]. 
In  order  to  achieve   the  aforementioned   goal,   the   design 
requirements  and    their   correlated   design   issues   for 
designing  and  implementing  M3DS  have  been  defined  in  [Ref.  7 
:  pp.  7-10]. 

1 .   Design  Requirements 

There  are  three  main  design  requirements  for  MBDS. 

(1)  The  system  must  be  expandable. 

(2)  Both  the  hardware  and  software  are  generic. 

(3)  The  database  is  evenly  distributed  across  the  disk 
systems  of  the  tackends,  and,  for  operation,  there  are 
parallel  and  concurrent  processing  of  transactions  by 
the  tackends. 
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The  first  twc  design  requirements  can  support  the 
addition  of  bac,;ends  for  performa  e  enhancement  and 
capacity  growth  by  adding  new  backends  of  the  same  type  and 
by  using  existing  system  software.  With  th€  third 
requirement,  performance  gain  (in  terms  of  response-time 
reduction)  and  capacity  growth  (in  terms  of  response-time 
invariance)  of  the  system  are  likely  to  be  in  proportion  to 
the  number  of  backends  of  the  system. 

2 .   Design  Issues 

There  are  several  issues  which  must  be  resolved  in 
order  to  meet  the  design  requirements  of  MBDS.  The  first 
issue  concerns  the  backend  controller.  As  shown  in  Figure 
1.1,  the  controller  may  become  a  primary  bottleneck  of  the 
system.  In  order  to  avoid  this  problem,  the  functions  of 
the  controller  should  be  minimized  and  reduced  to  the 
pre-processing  of  the  user  transactions,  the  post- processing 
of  the  transaction  results,  the  sending  and  receiving  data 
between  the  backends  and  the  host,  and  the  arbitration  of 
data  insertion  into  the  database. 

The  second  design  issue  addresses  the 
characteristics  and  functionality  of  the  communication  bus 
between  the  controller  and  the  backends.  The  bus  should  be 
cost-effective  and  efficient  for  both  backend  communication 
and  backend  addition. 

The  third  class  of  issues  involves  the  backends  of 
the  system.  The  backends  must  have  identical  software  to 
allow  replication  of  the  software  on  a  new  backend. 
Additionally,  the  backends  must  have  complete  software  to 
perform  all  of  the  database  management  functions.  These 
functions  include  directory  management,  concurrency  control, 
record  processing  and  communication. 

The  fourth  design  issue  concerns  the  database.  The 
database  should  be  evenly  distributed  across  all  the  disk 
systems  of  the  backends. 
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The  fifth  design  issue  is  on  the  choice  of  a  data 
model  and  data  language.  The  data  model  should  easily 
support  the  required  data  distribution  and  the  data 
placement  of  the  database.  The  data  language  for  the  system 
is  of  course  based  on  the  chosen  data  model.  It  must 
capture  all  of  the  primary  operations  of  the  database 
system.  The  chosen  data  model  is  the  attribute-based  data 
model  and  the  data  language  is  the  attribute-based  data 
language. 

The  sixth  design  issue  focuses  on  minimizing  the 
communications  traffic  of  the  system.  The  controller  should 
only  communicate  with  the  backends  for  sending  the 
pre-processed  user  transaction,  for  arbitrating  the  data 
placement,  and  for  receiving  results.  The  backends  should 
only  ccmmunicate  with  the  controller  for  sending  the  results 
of  the  user  transactions.  Communication  among  backends 
should  be  held  to  a  ninimum. 

The  seventh  issue  deals  with  the  directory  placement 
strategies.  In  order  to  enable  each  backend  to  perform  all 
the  database  management  functions  and  minimize  the 
communication  among  backends,  the  directory  data  are 
duplicated  at  each  backend. 

B.   TEE  UNDERLYING  ABE  INTENDED  HARDWARE 

An  overview  of  MEDS  hardware  organization  is  shown  in 
Figure  2.1  User  access  is  accomplished  through  a  host 
computer  which  in  turn  communicates  with  the  controller. 
When  a  transaction  (either  a  request  or  a  set  of  requests) 
is  received,  the  controller  will  broadcast  the  transaction 
to  all  the  backends.  Since  the  data  of  all  data  files  are 
evenly  distributed  across  all  the  backends,  all  backends  can 
now  execute  the  same  request  in  parallel.  A  queue  of 
requests   is  maintained   in  each   backend.    When  a   backend 
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Figure    2.1        The    MBDS    Hardware  Organization. 
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finishes  executing  one  request  it  will  send  the  results  of 
that  request  to  the  controller  and  be  able  to  start 
executing   the   next   request   independent    to    the   other    tackend. 

Originally,  HBDS  is  designed  to  be  configured  with  a 
number  of  microprocessor-based  processing  units  and  their 
disk  subsystems  and  be  connected  by  a  broadcast- tased 
communications  line.  When  the  implementation  of  HBDS  began, 
neither  the  microprocessor-based  computers  nor  the 
broadcast-based  communications  devices  were  available.  The 
present  MBDS  is  configured  with  a  VAX- 11/780  (VMS  OS)  as 
both  the  host  and  the  controller  and  two  PDP-11/U4s  (RSX-11M 
OS)  and  their  disk  systems  as  the  backends.  Communication 
between  computers  is  accomplished  by 

time-division-multiplexed        buses,  knowns        as        parallel 

communication      links      (PCLs) .  The        broadcasting      bus     is 

simulated   by    the   PCL. 

Currently,  MBDS  is  being  down-loaded  to  an  initial 
configuration  of  eight  microprocessor-based, 

broadcast-bus-connected,  and        Winchester-drive-supported 

workstations,  with  cne  of  the  eight  being  used  as  the 
controller  and  the  others  as  the  backends.  This  workstation 
(Sun-2/170,  U.2  BSD  UNIX  OS)  has  the  Motorola  MC68010  as  the 
CPO  with  16  mbytes  of  virtual  space  per  process  and  uses 
Ethernet  as  the  broadcast  bus  among  workstations.  The  disk 
drives  on  the  backends  are  Fujitsu  Eagle  Winchester-type 
drives,    with   a    formated   capacity   of    380    mbytes   per    drive. 

C.       THE    DATA   HODEL    AHD   THE    DATA    LANGUAGE 

In  this  section  we  will  first  introduce  the  concept  and 
terminology  of  the  attribute-based  data  model  which  is  the 
data  model  used  in  MBDS,  then  describe  the  data  language  in 
which    users   may    issue  request    to    MBDS. 
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1.      The   Attribute-based    Data    Model 

MBDS  chooses  the  attribute-based  data  model  to  be 
its  data  model.  In  the  attribute-based  data  model,  data  is 
modeled      with    the      constructs:  database,      file,         record, 

attribute-value        pair         (keyword),  directory        keyword, 

directory,  record  bcdy,  keyword  predicate,  and  query. 
Informally,  a  database  is  a  collection  of  files,  each  file 
contains  a  groups  of  records  which  are  characterized  by  a 
unique   set   of   directory   keywords.  A   record   is    composed   of 

two        parts.  The        first        part        is      a        collection        of 

attribute-value  pairs  or  keywords.  An  at  tribute- value  pair 
is  a  member  of  the  Cartesian  product  of  the  attribute  name 
and      the   value      domain   of      the    attribute.  As    an      example, 

<SALARY,  30000>  is  an  attribute- value  pair  having  30000  as 
the  value  for  the  attribute  SALARY.  All  the  attributes  in  a 
records        are  required        to        be  distinct.  Certain 

attribute-value  pairs  of  a  record  (or  a  file)  are  called  the 
directory  keyword  of  that  record  (file) ,  because  either  the 
attribute-value  pairs  or  the  ranges  of  their  attribute 
values  are  kept  in  the  directory  for  addressing  the  record 
(file).  The  rest  of  the  record  is  textual  information  which 
is   referred   to    as   the   record   bodv. 

The  angle  brackets,  <,  >,  enclose  an  attribute-value 
pair.  The  curly  brackets,  {,  },  include  the  record  body. 
The      parenthesis,  (,        )  ,         form      a      record.  The      first 

attribute-value  of  all  records  of  a  file  is  the  same.  In 
particular,  the  attribute  is  FILE  and  the  value  is  the  file 
name.  An  example  cf  a  record  of  employee  file  is  shown 
below : 

(<FILE,    Employee>,    <JCB,    Mgr>,    <DEPT,Toy>,    <SALARY,    30000> 

{Employee    Description} ) 
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The  record  has  four  keywords  and  a  record  body  of  employee 
description. 

A  keyword  predicate .  or  simply  predicate,  is  of  the 
form 

(attribute,  relational  operator,  value) . 

without  confusion,  we  also  use  parenthesis  to  enclose  a 
predicate.  A  relati  cnal  operator  can  be  one  of  (  =,  !=,  <, 
=<,  >=) .  For  example,  (SALARY  >  20000)  is  a  predicate.  A 
keyword  K  is  said  to  satisfy  a  predicate  T  if  the  attribute 
of  K  is  identical  to  the  attribute  in  T  and  the  relation 
specified  by  the  relational  operator  of  T  holds  between  the 
value  of  K  and  the  value  in  T.  For  example,  the  keyword 
<SALARY,  30030>  satisfies  the  predicate  (SALARY  >  20000). 

A  query  consists  of  several  keyword  predicates  in 
disjunctive  normal  form.   An  example  of  a  guery  is: 

{  (DEPT=Toy)  and  (  (SALARYO  0000)  or  (S  ALAEY>2  0000)  )  )  . 

2-   The  Attribute-based  Data  Language 

The  data  manipulation  language  for  MBDS,  the 
attribute-based  data  language  (ABDL)  is  a  non-procedural 
language  which  originally  supports  four  primary  database 
operations:  RETRIEVE,  INSERT,  DELETE  and  UPDATE.  It  is  the 
purpose  of  this  thesis  to  design  and  implement  the  fifth 
primary  database  operation,  the  RETRIEVE-COMMON  operation. 

Ihe  RETRIEVE  request  is  used  to  retrieve  records  of 
the  datahase.  The  syntax  of  a  RETRIEVE  request  is  shown  as 
below : 

RETRIEVE  Query  (Target-List)  [EY  Attribute]  [WITH  Pointer] 

The  guery  specifies  »hich  records  are  to  be  retrieved.  The 
target- list  is  a  list  of  output  attributes.  It  may  also 
consist   of  an   aggregate  operators   on  one   or  more   output 
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attribu  ;s.  MBDS  supports  five  aggregation  operators,  they 
are:  A  J,  COUNT,  SOM,  MIN  ana  MAX-  The  BY-clause  and  the 
wTTH-clause  are  optional.  The  BY-clause  may  be  used  to  group 
records  when  an  aggregate  operation  is  specified.  The 
WITH-clause  may  be  used  to  specify  whether  pointers  to  the 
retrieved  records  must  be  returned  to  the  user  or  user 
program  for  later  use  in  an  update  reguest.  Some  examples  of 
retrieve   reguest   are    shown  in    below. 

Example  1.  Retrieve  the  names  of  all  employees  who  work  in 
the    Toy   department. 

RETRIEVE     (  (FILE=Employee)    and     (DEPT=Toy) )      (NAME) 


Example    2.    List    the   average    salary   of   all    departments. 
RETRIEVE     (FILE=Employee)      (AVG  (SALARY) )     BY    DEPT. 

The   INSERT   reguest      is   used   to    insert      a    record   into 
the    database.      The    syntax   of   as   INSERT   reguest   is: 

INSERT    Record 

The    following   example   will   insert   a    record    into    the    Employee 
file. 

INSERT     (<EILE,Employee>,    <SA1ARY, 30000>,    <DEPT,    Toy>) 


The   syntax    of  a   DELETE   reguest  is: 

DELETE    Query 
where  the   guery    specifies   the      record (s)       to   be    removed   from 
the    database.    The    following    example    will   delete    records   from 
the    Enplcyee   file. 

DELETE    (  (FILE=Emp^jyee)     and     (S ALARY  =  30000)     and    (DEPT=   Toy)). 
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The  UPDATE  request  is  used  to  modify  records  of  the 
database.  The  syntax  cf  the  UPDATE  request  is: 

UPDATE  Query  <Modifier> 

where  the  query  specifies  the  particular  records  to  be 
updated  from  the  database  and  the  modifier  specifies  the 
kinds  of  modification  that  need  to  be  done  on  records  that 
satisfy  the  ^uery.  The  following  example  will  give  a  $1000 
raise  to  all  employees. 

UPDATE  (FTLE  =  Employee)  <SALARY=SALARY+10 00> 

The   RETRIEVE-COMMON  request   is  used   to  merge   two 

files  by  common  attributes.    It   will  be  detailly  discussed 

in  the  later  chapters. 

D.   TEE  PROCESS  STRUCTURE 

MBDS  is  a  message-oriented  system.  In  a 
message-oriented  system,  each  process  corresponds  to  one 
system  function.  These  processes  communicate  among 
themselves  by  passing  messages.  The  processes  are  created  at 
system  start  time  and  exist  until  the  system  is  stopped. 
Figure  2.2  provides  an  overview  of  MBDS  process  structure. 

1 .   The  Communication  Processes 

Communication  between  computers  in  MBDS  is  achieved 
by  using  the  PCL.  MEDS  provides  a  software  abstraction  to 
this  bus  for  each  computer  in  order  to  emulate  broadcast 
capabilities.  The  abstraction  consists  of  two  complimentary 
processes.  The  first  process,  get-pcl,  gets  message  from 
other  computers  off  the  PCL.  The  second  process,  put-pel, 
puts  messages  on  the  bus  to  be  broadcasted  to  other 
computers.  Every  computer,  whether  it  is  the  controller  or  a 
backend,  has  its  own  get-pcl  and  put-pel. 
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The  Controller 


Test 
InterftK  e, 


risen  Information 

(ienerrtt  ion 


Broadcast    Bus 


Each  Backend 


Figure  2.2   The  MBDS  Process  Structure. 
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There  are  31  message  types  and  one  general  message 
format  used  in  the  MBDS  message-passing  facilities.  The 
format  (shown  in  Figure  2.3)  is  used  for  each  of  the  three 
message-passing  facilities,  namely,  messages  within  the 
controller,  messages  within  the  backends,  and  messages 
between  computers. 


Message 


Data  Type 


Message  Type 

Message  Sender 

Message  Receiver 

Message  Text 


a  numeric  code 

a  numeric  code 

a  numeric  code 
an  alphanumeric  field  terminated 
by  an  end  of  message  marker 


Figure  2.3    The  General  Format  of  MBDS  Messages- 
Messages   between  computers   are  divided   into  two   classes: 
messages    between   backends    and    messages  between    the 
controller  and  the  backends.    Figure   2.4  describes  each  of 
MBDS  message  types. 

2-  Ihe.  Test  Interface  Process 

The  test  interface  process  allows  the  user  to 
interact  with  the  MBES  directly.  Since  MBDS  does  net  use  a 
host  computer,  the  test  interface  process  is  contained  in 
the  controller. 

3-  The  Processes  of  the  Controller 

In  addition  to  the  communications  and  test-interface 
processes,  the  controller  consists  of  three  additional 
processes:  Request  Preparation  (RP) ,  Insert  Information 
Generation  (IIG)    and  Post  processing  (PP)  .     F.P  receives, 
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parses  and    formates    a     request    (transaction)       before  sending 

the             formated           request              (transaction)             to  the 

directory-management   process   in   each      backend.         IIG  is  used 

to   provide      additional   information   to      the   backends  when   an 

insert    request   is      received.       PP   is    used    to      collect  all    the 

results  cf  a  request  (transaction)  and  forward  the  results 
to   the   user. 

**  •      The  Processes   of    Each    Backend 

In  addition  to  the  communication  processes,  each 
tackend  also  consists  of  three  other  processes:  Record 
Processing  (RP)  ,  Directory  Management  (DM)  and  Concurrency 
Control    (CC)  . 

DM  controls  the  execution  of  a  request  at  a  backend 
and  accesses  the  seccndary-storage-based  directory  tables. 
It  determines  the  disk  addresses  where  the  relevant  data  of 
a  particular  request  are  stored  and  then  sends  those  disk 
addresses   to   RP. 

CC  is  used  to  insure  the  consistency  of  the  database 
while   allowing   concurrent  execution    of    multiple    requests. 

EP  performs  the  disk  I/O  operations  and  other 
operations      specified      by      the   request.  It      receives      the 

secondary-addresses  from  DM,  which  processes  the  request. 
The   results  are   then    forwarded    to   the  controller. 
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MESSAGE-TYPE  NUMBER  AND  NAME 

1  TRAFFIC  UNIT 

REQUEST  RESULTS 

NUMBER  OF  REQUESTS  IN  A  TRANSACTION 

AGGREGATE  OPERATORS 
5  REQUESTS  WITH  ERRORS 

PARSED  TRAFFIC  UNIT 

NEW  DESCRIPTOR  ID 

BACKEND  NUMBER 

CLUSTER  ID 
10  REQUEST  FOR  NEW  DESCRIPTOR  ID 

BACKEND  RESULTS  FOR  A  REQUEST 

BACKEND  AGGREGATE  OPERATOR  RESULTS 

RECORD  THAT  HAS  CHANGED  CLUSTER 

RESULTS  OF  A  RETRIEVE  OR  FETCH 
CAUSED  BY  AN  UPDATE 
15  DESCRIPTOR  IDS 

REQUEST  AND  DISK  ADDRESSES 

CHANGED  CLUSTER  RESPONSE 

FETCH 

OLD  AND  NEW  VALUES  OF  ATTRIBUTE 
BEING  MODIFIED 
20  TYPE-C  ATTRIBUTES  FOR  A  TRAFFIC  UNIT 

DESC-ID  GROUPS  FOR  A  TRAFFIC  UNIT 

CLUSTER  IDS  FOR  A  TRAFFIC  UNIT 

RELEASE  ATTRIBUTE 

RELEASE  ALL  ATTRIBUTES  FOR  AN  INSERT 
25  RELEASE  DESCRIPTOR-ID  GROUPS 

ATTRIBUTE  LOCKED 

DESCRIPTOR-ID  GROUPS  LOCKED 

CLUSTER  IDS  LOCKED 
29  NO  MORE  GENERATED  INSERTS 
29  NO  MORE  GENERATED  INSERTS 

29  NO  MORE  GENERATED  INSERTS 

30  REQUEST  ID  GF   A  FINISHED  REQUEST 

31  AN  UPDATE  REQUEST  HAS  FINISHED 
31  AT,'  UPDATE  REQUEST  HAS  FINISHED 


SOURCE  OR  DESTINATION  DESIGNATION 


I   PATH  DESIGNATION 


HOST 
REQP 
IIG 
PP 

RECP 
CC 


HOST  MACHINE    'TEST-INT) 

ptrmjrc"    PREPARATION 

INSERT* INFORMATION   GENERATION 

POS^   PROCESSING 

DIRECTORY   MANAGEMENT 

RECORD   PROCESSING 

CONCURRENCY   CONTROL 


H 

C 
C 

c 

B 
B 
B 


HOST 

CONTROLLER 
CONTROLLER 
CONTROLLER 
A   BACKEND 
A  BACKEND 
A  BACKEND 


Figure    2.4        The    MBDS    Message    Types. 
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Ill-  DESIGN  AND  ANALYSIS  OF  THE  RETRIEVEzCOMHON  BEO.UEST 

In  this  chapter,  we  introduce  the  terminology  and 
notations  of  the  "Retrieve-Common"  request,  investigate  and 
analyze  several  possible  design  and  implementation 
approaches,  and  then  select  the  best  one  to  design  and 
implement  the  Retrieve-Common  operation  for  1BDS.  The 
selection  of  an  approach  is  based  on  the  design 
requirements  and  the  design  issues  of  MBDS. 

A.   THE  INTENDED  OPEBATION 

1.   An  Operation  Cn  Two  Files 

The  RETRIEVE-COMMON  request  is  used  to  merge  two 
files  by  common  attribute  values.  The  common  attribute 
values  are  the  attribute  values  which  belong  to  the  records 
of  both  files.  For  example,  suppose  there  are  two  files: 
file  A  and  file  B.  File  A  contains  the  records  of  the 
street  names  of  San  Jcse  city: 

(<FILE,  A>,  <STREET,  HONTEREY>,  <CITY,  SAN  JOSE>) 
(<FILS,  A>,  <STREET,  SECOND>,  <CITY,  SAN  JOSE>) 


File  E   consists  the  records  of   city  names  of   the  Monterey 
county: 

(<FILE,  B>,  <CITY,  MONTEREY>,  <COUNTY,  MONTEREY>) 
(<FILE,  B>,  <CITY,  SEASIDE>,  <COUNTY,  MONTEREY>) 
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The  RETRIEVE-COMMON  request  can  provide  us  a  third  file, 
say,  file  C,  with  the  information  such  as:  "All  the  records 
cf  both  files  A  and  E,  where  the  street  name  of  the  records 
in  file  A  is  identical  to  the  city  name  of  the  records  in 
file  E.  One  of  the  records  in  file  C  which  satisfy  the 
request  would  be 

{<FI1E,  C>,  <FILE,  A>,  <STREET,  MONTEREY>,  <CITY,  SAN  JCSE>, 
<FILF,  E>,  <CITY,  MONTEREY},  <COUNTY,  MONTEREY>). 

logically,  the  retrieve-common  request  involves  two 
retrieval  operations.  We  define  the  first  retrieval 
operation  as  the  source  retrieve  and  the  second  retrieval 
operation  as  the  target  retrieve.  The  set  of  all  the 
records  that  belong  to  the  result  of  the  source  retrieve  is 
called  the  source  record  set.  The  set  of  all  the  records 
that  belong  to  the  result  of  the  target  retrieve  is  called 
the  target  record  set.  A  source  (target)  record  is  the 
record  that  belongs  to  the  source  (target)  record  set. 
Similarly,  those  attributes  will  be  refered  as  source 
attributes  and  target  attributes.  The  merged  source  and 
target  records  are  termed  the  result  record  set.  The 
aforementioned  file  C  is  a  result  record  set. 

We  term  the  source  and  target  attribute  names  that 
participate  in  the  retrieve-common  operation  the  join 
attribute  names  or  briefly  join  attributes.  However,  their 
values  are  termed  common  attribute  values,  or  simply  common 
values.  The  retrieve-common  operation  requires  that  the 
join  attribute  which  is  specified  in  the  source  record  set 
must  have  the  same  dcmain  as  that  of  the  join  attribute  in 
the  target  record  set,  although  they  need  not  have  the  same 
attribute  name. 

Consider  another  example,  suppose  the  source  records 
are  characterized  by  the  attributes,  Employee_name,  wages, 
and  the   target  records  are   characterized  by   Rank,   Wages. 
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Further,  let  the  domain  of  the  Employee_nam€  be  the 
character  string  and  the  domain  of  both  Rank  and  Wages  be 
the  integer.  A  retrieve-common  operation  may  be  performed 
by  merging  on  the  attribute  values  of  the  wage  of  the 
respective      source        record      and      the      target        record.  A 

retrieve-common  operation  may  also  be  performed  by  merging 
on  the  wages  of  the  source  record  and  the  ranks  of  the 
target   record.  Since   their      value   domains     are   the      same. 

However,  a  merge  between  the  employee  names  and  the  ranks 
would   net   be  permitted,    since   their    domains   are    different. 

The     logical        operation      for        the     ret rieve-cemmon 
reguest   can   be    described    as    follows. 

(1)  All  records  satisfying  the  source  retrieve  are 
collected. 

(2)  All  records  satisfying  the  target  retrieve  are 
collected. 

(3)  The  records  of  the  two  collections  are  pairwise  merged 
on  the  common  (source  and  therefore  target)  attribute 
values. 


2-      The    Syntax   Of  Retrieve -Co mm on    Request 

When  developing  the  syntax  of  the  retrieve-common 
request,  we  must  attempt  to  design  a  data  language  construct 
that  is  similar,  syntactically,  to  the  other  primary 
operations      of         ABD1.  In      particular,         the        syntax      of 

retrieve-common    operation   should   resemble      the   syntax   of    the 
ABDL    retrieve  operation   given   below: 
RETRIEVE  Query     (Target- list)    [BY  Attribute]  [WITH   Pointer] 

Using   the   above    syntax   as   a    guideline,      we    define    the    syntax 
for    the   retrieve-comiEcn   reguest    as    follows. 

RETRIEVE  Query-1     (Target-list- 1)  [  BY    Attribute  ][  WITH   Pointer] 
COMMON    (Attribate-1,    Attribate-2) 
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BETRIEVE  Query-2  (Target-list-2) [ BY  Attribute  ][  WITH  Pointer] 

The  retrieve-common  request  consists  of  three  parts. 
The  first  part  is  what  we  have  referred  to  as  the  source 
retrieve  request,  which  retrieves  the  source  record  set. 
The  second  part  is  the  specification  of  the  join  attributes, 
where  Attribute-1  belongs  to  the  source  record  and 
Attribute-2  belongs  to  the  target  record.  Although  the 
values  of  these  two  attributes  must  be  the  same  in  order  to 
satisfy  the  condition  for  merging  the  respective  records, 
their  attribute  names  need  not  be  identical.  The  third  part 
is  what  has  been  refered  to  as  the  target  retrieve  request, 
which  retrieves  the  target  record  set. 

B.   AN  ANALYSIS  OF  DIFFERENT  DESIGNS 

In  order  to  make  this  thesis  self-contained,  several 
possible  design  approaches  described  in  [  Ref .  8]  are 
reviewed  in  this  section. 

The  main  issue  when  considering  alternative  strategies 
for  implementing  the  retrieve-common  request  is  where  the 
merge  of  the  source  and  the  target  records  should  be 
performed. 

There  are  three  najor  alternatives  for  distributing  the 
workload  of  the  retrieve-common  request. 

(1)  The  controller  does  all  of  the  merge  operation. 

(2)  The  backends  do  all  of  the  merge  operation. 

(3)  The  controller  and  the  backends   share  the  workload  of 
the  merge. 

Each  of  these  alternatives  will  be  analyzed  and  judged  using 
the  design  requirements  and  design  issues  of  MBDS. 

In  order  to  simplify  the  analysis  of  design  (or 
implementation)  strategies,  we  make  the  following 
assumptions. 
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(1)  The  records  of  the  source  record  set  and  the  records 
of  the  target  record  set  are  distributed  evenly  across 
the  backends. 

(2)  The  operation  of  the  retrieve-common  is  performed  as 
described  in  the  previous  section. 

1-  The  Controller  Does  All  the  Merge  Operation 

In  this  alternative,  each  backend  only  performs 
these  two  retrieval  operations  and  then  sends  the  records  of 
source  record  set  and  records  of  the  target  record  set  to 
the  controller.  Upon  receiving  all  the  source  records  and 
target  records  f r : m  all  the  backends,  the  controller 
performs  the  merging  operation  and  sends  the  results  to  the 
host  computer. 

2-  She  Controlled  And   The   Backends   Share  The   Merge 
Operation 

Each  backend  performs  the  merge  operation  over  its 
source  records  and  target  records.  The  merged  records,  along 
with  the  source  and  target  record  sets  are  then  sent  tc  the 
controller.  The  controller  performs  the  merge  operation 
over  the  source  and  target  record  sets  coming  f r cm  different 
tackends  and  then  sends  the  results  together  with  the 
previously  merged  records  (done  by  individule  backends)  to 
the  hcst. 

3  •   Ihe  Backends  Eo  All  the  Mer_g e  Operation 

This  alternative  may  be  further  broken  into  two 
sub a Iter natives. 

(a)  The  backends  share  the  merge  operation. 

The  tackends  send  either  source  or  target  records  to 
each  other.  Let's  assume  that  the  target  records  are 
sent.  Each  backend  will  have  a  portion  of  the  scurce 
record  set  and  a  whole   set  of  target  records.    Then, 
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the  backends  perform  the  merge  operation  over  its  own 
source  records  and  all  of  the  target  records,  and 
sends  the  results  to  the  controller. 

(b)  One  designated  tackend  performs  the  merge  operation. 
All  records  of  both  the  source  record  set  and  the 
target  record  set  are  sent  to  the  designated  backend 
from  all  of  the  other  backends.  The  designated 
tackend  performs  the  entire  merge  operation  and  sends 
the  results  to  the  controller. 

H .   An  Analysis  of  the  D esi cjn  Approaches 

Four  alternatives  of  distributing  the  workload  of 
the  merge  operation  among  the  controller  and  the  backends 
have  teen  discussed  in  previous  subsection.  He  now  examine 
these  alternatives  with  the  design  goals  of  MBDS. 

Alternative  1,  where  the  controller  performs  the 
entire  merge  operation  will  increase  the  workload  of  the 
controller.  Recall  that  in  chapter  II  we  have  stressed  that 
in  order  to  reduce  the  chance  of  the  controller  being  the 
bottleneck  of  the  system,  we  minimize  the  work  of  the 
controller.  Alternative  1  violates  this  design  requirement 
Therefore,  it  will  not  be  considered  further. 

Alternative  2  will  increase  the  communications  load 
and  increase  the  workload  of  the  controller.  This 
alternative  complicates  the  first  and  the  sixth  design 
issues  of  M3DS.  Therefore,  it  will  also  be  eliminated  from 
the  design  consideration. 

Alternative  3a  meets  the  design  issue  of  minimizing 
the  controller  function  and  distributing  the  workload  to 
each  backend  evenly.  Alternative  3b  does  not  increase  the 
workload  of  the  controller;  nor  does  it  distribute  the 
workload  to  each  backend.  Furthermore,  transmitting  all  the 
records  of  both  the  source  record   set  and  target  record  set 
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will  increase  the  cciiEun.i.ca  tions  overhead.  In  addition, 
performing  the  entire  merge  operation  in  one  backend  will 
unbalance  the  workload,  thereby  reducing  the  parallelism  of 
the  backends,  i.e.,  by  having  a  sin jxo-bac^end  tc  dc  the 
merge  an  l  ail  other  backends  to  idle.  This  complicates  both 
or  the  third  and  sixth  design  issues,  so  this  alternative  is 
also    eliminated. 

With  this  analysis  we  clioosc  the  alternative  3a  as 
our  design  approach.  "hat  is,  each  backend  £erfor§s  a 
■tiULkial  merge  with  its  portion  of  source  records  and  all 
target      records.  And      then,         sends      its      result      to      the 

controller.  The  controller  forwards  the  final  result  to  the 
host    computer. 

C.        AN    ANALYSIS    OF    DIFFERENT    I HPL2MEHTAII0NS 

Three    different      implement  ations    tor    merging       the    scarce 
and    the    target    record   sets   are    considered. 

(1)  A    straightforward   implement  at  ion. 

(2)  An    ifflt iement at icn    based    en    sorting    and    matching. 

(3)  An  imple mentation    based    on    bucket-hashing. 

1  •      Ik®    Stra  i  jh  ticrward    Ijs£  lem  en  tat  ion 

The  concept  of  this  alternative  is  very  simple  and 
the  merging  operation  is  based  on  the  "nest-loop"  algorithm 
[Ref.    8    :    p.    86]    which    is    shown    in    Figure    3.1. 

This  alternative    is    accomplished    in    live    phases: 

(1)  Each  backend  determines  its  own  source  r^corus  and 
stores  them  intc  a  predefined  portion  of  the  secondary 
storage   area. 

(2)  Each  backend  determines  its  awn  target  records  anl 
stores  them  into  the  predefined  portion  cf  the 
secondary    storage    area. 
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PROCEDURE   Nest_loop_merge 

FOR  each    record   in    the   source    record   set  DO 
FOR  each    record    in    the  target    record  set   DO 
IF   the   merging   condition    is    satisfied 

THEN 

form  a  result  record 
END  IF 
END  FOR 
END  FOR 
END  FROCEDORE  Nest_loop_mer ge 


Figure  3.1    The  Nest-loop  Merge  Procedure. 

(3)  Each  tackend  broadcasts  its  own  local  target  records 
to  all  of  the  ether  backends. 

(4)  Each  tackend  receives  the  broadcasted  target  records 
from  the  other  backends  and  stores  them  into  the 
secondary  storage  together  with  its  own  target 
records. 

(5)  Each  tackend  brings  its  own  source  records  and  the 
entire  target  record  set  into  the  primary  memory, 
performs  the  "nest-loop"  merging  operation  and  then 
send  the  merged  results  to  the  controller. 

2 •   Ihe  Implementation  Based  on  Sorti  ng,  and  Hatching 

The  idea  of   this  implementation   is  based  on   the 
following  inference. 

Since  the  retrieve-common  operation  is  simply  a  merging 
operation  on  two  files  of  records  sets,  if  we  can  have 
these  two  files  presorted  by  the  values  of  their  common 
attributes  then  the  merging  operation  may  be  efficiently 
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performed  by    matching   the   values   of   the   common 
attributes  of  the  records  of  these  two  files. 

There  are  two  possible  alternatives  to  perform  the 
sort-match  algorithm. 

(a)  The  backends  do  all  of  the  sorting  and  matching 
operations. 

(b)  The  backends  and  the  controller  share  the  sorting  and 
matching  operations. 

Alternative  (t)  will  increase  the  workload  of  the 
controller  and  contradict  with  the  design  goals  of  MBDS,  and 
is  therefore  eliminated  from  consideration.  Only 
alternative  (a)  will  be  examined.  Alternative  (a) 
accomplishes  the  retrieve-common  operation  in  four  phases. 

(1)  Each  backend  retrieve,  sorts  and  stores  its  own  source 
records  and  target  records  separately,  and  then 
broadcasts  either  set  of  records  to  the  ether 
backends.  (Let's  assume  that  the  target  records  are 
transmitted.) 

(2)  Each  backend  receives  and  merges  the  incoming 
ncn-local  target  records  into  its  own  local  target 
records. 

(3)  Each  backend  performs  the  matching  operation  over  its 
own  portion  of  source  records  and  the  entire  set  of 
target  records  (from  all  the  backends)  . 

(4) The  backends  send  the  results  to  the  controller. 

3  •   Ihe  Implementation  Based  on  Bucket-Hashing 

This  implementation  strategy  attempts  to  speed  up 
the  comparison  and  merge  by  hashing  records  into  small 
groups  (the  buckets  of  the  hashing  table)  which  contain 
records  with  common  attribute  values,  so  that  the  time 
complexity  of  the  merging  operation  may  be  reduced. 
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A  hashing  function  applied  to  the  common  attribute 
values  is  used  to  hash  records  into  buckets.  The  bucket 
numbers  are  consecutive  integers.  Instead  of  using  primary 
and  overflow  areas,  the  buckets  use  one  or  more  fix-sized 
blocks  to  store  records.  The  numbers  of  blocks  may  vary 
among  buckets.  Details  of  the  hashing  table,  the  buckets 
and  the  the  blocks  will  be  described  in  the  next  chapter. 

Those  source  records  and  target  records  within  the 
same  bucket  will  be  examined  and  merged  if  the  merging 
condition  is  matched.  This  alternative  can  also  be  broken 
to  two  sutalternatives. 

(a)  One  common  hashing  table  is  used  for  both  source  and 
target  record  sets. 

(b)  Twc  separate  tatles  are  used,  one  for  each  record  set. 

a.   One  Common  Hashing  Table 

This  alternative  is  accomplished  by  each  backend 
in  four  phases: 

(1)  All  local  source  records  will  be  hashed  and  stored 
into  blocks  according  to  their  hashed  values.  These 
blocks  (therefore  buckets)  are  termed  source  blocks 
(tuckets) . 

(2)  After  all  the  local  source  records  have  been  hashed, 
the  local  target  records  are  hashed  one  at  a  time  and 
buffered.  If  the  target  record  is  hashed  into  an 
empty  source  bucket,  then  it  is  buffered  for 
transmitting  to  other  backends.  Otherwise,  all  the 
records  in  the  source  bucket  will  be  retrieved  and 
merged  with  that  target  record  only  if  the  merging 
condition  is  satisfied.  The  results  are  first 
buffered   and  then  sent  to  the  controller. 

(3)  Since  the  non-local  target  records  may  arrive  at  a 
tackend  while  the  backend  is  processing  some  ether 
records,  each  backend  will  place  these  incoming 
records  on  a  predefined  secondary  storage  area. 
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(4)  Each  kackend  retrieves  the  non-local  target  records 
from  the  secondary  storage  area  and  processes  them  in 
the  same  way  as  the  the  backend  does  on  its  local 
target  records. 

1.   Separate  Hashing  Tables 

This   alternative   is   accomplished    in   three 
phases. 

(1)  The  backends  will  hash  and  store  their  own  source 
records  and  target  records  into  two  separate  hashing 
talles  by  a  common  hashing  function.  After  all  of  the 
target  records  have  been  hashed  and  stored,  each 
backend  will  broadcast  the  hashed  results  of  their 
target  records  (i.e.,  the  bucket  number  and  the 
records  associated  with  that  bucket  number)  to  all  of 
the  other  backends. 

(2)  Upon  receiving  all  of  the  target  information  from  the 
other  backends,  each  lackend  stores  those  target 
records  into  appropriate  buckets  according  to  their 
bucket  numbers. 

(3)  The  backends  perform  the  merge  operation  on  the  local 
source  records  and  the  entire  set  of  target  records 
and  send  the  results  to  the  controller.  The  procedure 
is  shown  in  Figure  3.2. 

4.   A  Comparison  Cf  The  Three  Implementation  Approaches 

In  this  section  we  compare  and  analyze  these 
implementation  approaches.  Since  the  backends  work  in 
parallel,  our  analysis  only  focuses  on  how  much  time  it 
takes  for  one  backend  to  do  one  particular  strategy.  There 
are  common  operations  that  each  backend  performs,  so  that 
the  time  complexities  for  these  operations  can  be  ignored 
when  comparing  the  implementation  strategies.  The  times  of 
these  common  operations  are: 
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PROCEDURE    Hashing_merge 

FOR   the    bucket_value    =    min_value    to    max_value   DO 
IF   the    buckets   of    both    tables    are   not    empty 
then 

retrieve   all  the   records   from  both    tuckets 
perform   merge   operation   based   on 
the    straightforward   algorithm 
End   IF 
EKD    FOR 

END    EEOCSDURE    Hashing_merge 


Figure   3.2        The   Hashing_merge   Procedure. 

(1)  the  time  to  process  the  records  for  the  source  reguest 
which  involes  determining  which  records  of  the 
database  satisfy  the  query,  projecting  the 
attribute- value  pairs  of  the  target-list  of  the 
satisfied    records   and    forming    a   source   record   set; 

(2)  the  time  to  process  the  records  for  the  target 
reguest,  which  involes  determining  which  records  of 
the  database  satisfy  the  query,  projecting  the 
attribute- value  pairs  of  the  target-list  of  the 
satisfied    records   and    forming    a    target    record    set; 

(3)  the  time  to  broadcast  the  local  target  records  to  the 
other   backends;    and 

(4)  the   time    to    send   the   merged   results    to    the    controller. 

Ihe    following    notions     are    introduced    to    simply      the   ensuing 
analysis. 

Cs    :    Cardinality  of  the  source   record   set    in  one   backend. 

Ct    :    Cardinality    of   the   target    record    set   in   one   backend. 

Cb    :    Average    number  of   records    in    a   bucket. 
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M   :  Number  of  Backends. 

B   :  Number  of  Index  Entries  in  the  hashing  table. 

Ti  :  Average  time  tc  read  (write)   a  block  of  records  from 

(to)  secondary  storage. 
Tb  :  Average  time  tc  read  (write)   a  record   form  (to)   a 

bucket. 
Tc  :  Average  time   to  compare  the  common   attribute  values 

of  two  records. 
Th  :  Average  time  tc  hash  a  record. 
Tm  :  Average  time  tc  merge  two  records. 

a.   An    Analysis    for     the    Straightforward 
Implementation 

We  recall  that  there  are  five  phases  in  this 
implementation  as  discussed  in  a  previous  section. 

Phase  1:  Since  there  are  Cs  local  source 
records  in  each  backend,  the  time  complexity  for  storing 
them  into  the  secondary  storage  is: 

Ti*(Cs/Cb) . 

Phase      2:  Since      there  are      Ct      local      target 

records   in     each    backend,      the      time  complexity      for   storing 
them    into    secondary   storage    is: 

Ti* (Ct/Cb) . 

Phase  3:  The  time  complexity  for  this  phase  is 
ignored. 

Phase  4:  Since  each  backend  receives  (M-1)  *Ct 
target  records  from  the  other  backends,  the  time  complexity 
for  storing  them  in  the  secondary  is: 

(M-1)*  (Ct/Cb)  *Ti. 

Phase  5:  Records  are  merged  in  this  phase. 
There  are  Cs  source  records  and  M*Ct  target  records  in  each 
backend.  Each  block  of  the  source  records  is  compared  and 
merged  with  all  of  the  target  records.  It  takes  Ti  to  bring 
one  blcck  of  source  records  into  the  primary  memory  from  the 
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secondary   storage  and   M*  (Ct/Cb)  *Ti  for   the  entire   target 
record  set. 

It  takes  Cb*Tb  to  access  one  block  of  source 
records  and  M*Ct*Tb  to  access  all  of  the  target  records. 
Ihe  time  complexity  for  comparing  one  block  of  the  source 
records  and  all  of  the  target  records  is 

Cb*M*Ct*Tc. 
"Re   further  assume  that   there  are  k  fraction   of   target 
records   participating   the  merging   operation.     The   time 
complexity  for  merging   one  block  of  source   records  and  all 
of  the  target  records  becomes: 

k*M*Ct*Tm. 
The  total  time  complexity  for  processing  one  block  of  source 
records  of  this  implementation  is: 

£  Ti  +  M*  (Ct/Cb)  ]+ (Cb+K*Ct*Cb) *Tb  + (Cb*M*Ct*Tc) ♦ (k*M*Ct*Tm)  . 

There  are  Cs/Cb  blocks  of  source  records  in  each 
backend;  therefore,  the  time  complexity  of  this  alternative 
is: 

(Cs/Cb)*  {[Ti+M*  (Ct/Cb)  ]♦  (Cb  +  fl*Ct*Cb)  *?b 

♦  (Cb*M*Ct*Tc)  *(k*M*Ct*Tm)  } 

or 

(M*Cs*Ct)  *[  Ti+  (1b+k*Tm)  /Cb+Tc  ]+Ti*  (Cs/Cb)  +Cs*Tb 

Because   Cs   may    be   equal   to    Ct    and   H    is    a   small   constant,    the 

time    complexity    may    be   further    simplified    to    be 

0(Cs*Ct)    or 

0  (Cs2)  . 

b.   An  Analysis  for  the  Sort-Matching  Implementation 

We  will  analyze  each  phase  of  this 
implementation  approach. 

Phase  1:   Each  backend  sorts  its  two  record  sets 

and  broadcasts   the  sorted   target  record   set  to   the  other 
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tackends.  Due  to  the  irge  size  of  records,  the  sorting 
operation  can  not  be  done  by  using  an  internal  sorting 
algorithm.  There  are  several  external  sorting  algorithms 
which  can  sort  the  lccal  source  records  and  the  local  target 
records  with  the  time  complexities  of  0 (Cs* (logCs) )  and 
0(Ct*(log  Ct)  )  ,  respectively.  However,  these  algorithms  all 
have  some  limitations:  either  using  special  hardware 
configuration  or  running  different  software  among  processors 
[Refs.  9,10]. 

Because  we  do  not  want  to  put  limitaticns  on  the 
hardware  configuration  of  ttBDS  and  to  use  different  software 
among  the  backends,  this  alternative  is  eliminated  from  our 
consideration. 

c.   An     Analysis     for    the     Bucket-Hashing 
Implementation 

In  order  to  further  simplify  our  analysis,  we 
assume  that  the  local  source  records  and  target  records  can 
be  evenly  hashed  across  all  the  buckets  of  the  hashing 
tables  and  each  bucket  will  contain  only  one  block  of  local 
source  records  or  one  block  of  local  target  records.  First, 
we  analyze  the  alternative  that  uses  only  one  hashing  table. 

Phase  1:  Each  source  record  needs  to  te  hashed, 
written  into  a  bucket  by  its  hashed  value.  This  includes 
getting  the  block  of  that  bucket  from  the  secondary  storage 
and  writing  the  record  into  the  block  and  returning  the 
block  to  the  secondary  storage.  Therefore,  the  time 
complexity  for  each  tackend  to  hash  and  store  the  source 
records  is: 

Cs*  (Th  +Tb  +  2Ti)  . 

Phase  2:  Every  time  a  target  record  is  hashed, 
the  bucket  with  that  hashed  value  is  checked.  If  the  bucket 
is  not   empty,   then  all  the   source  records  in   that  bucket 
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will  he  retrieved  into  the  primary  memory,  compared  with  the 
target  record  and  merged  with  it  if  their  common  attribute 
values  are  equal.  The  time  complexity  for  bring  one  bucket 
(block)  of  source  records  intc  primary  memory  is  Ti .  The 
time  complexity  for  accessing  those  source  records  from  the 
block  and  comparing  with  that  target  record  is: 

Cb   *  (Tb  +  Tc)  . 

Suppose  that  the  probability  of  hashing  a  target  record  into 

a  non-empty  bucket  is  p   and  the  probability  of  satisfying 

the  merging  condition  is  f,    then  the  time   complexity  for 

each  tackend  to  process  one  local  target  records  is: 

Th  ♦  p  *  [Ti  +  Cb  *  (Tb  +  f  *  Tc)  ]. 

Because  we  assume  the  source  records  are  evenly  hashed 
across  the  buckets  of  the  hashing  table,  p  is  equal  to  1. 
There  are  Ct  local  target  records  in  each  backend  so  that 
the  time  complexity  for  each  backend  to  process  its  local 
target  records  is: 

Ct*  {Th+[Ti+Cb*  (Tb+Tc+f*Tm)  ]}  . 

Phase  3:  Each  backend  receives  (M-1)*Ct  target 
records  frcm  other  backends.  The  time  complexity  for 
storing  these  records  back  to  the  secondary  storage  is: 

(M-1)  * (Ct/Cb) *Ti. 

Phase  4:  It  takes  (M-1)  *  (Ct/Cb)  for  each  backend 
to  retrieve  all  the  non-local  target  records  from  the 
secondary      storage      into      the      primary      memory.  The      time 

complexity   for    processing   these   records   is: 

(M-1)  *Ct*{Th+[Ti+Cb*(Tb  +  Tc+k*Tm)  ]}  . 

The   time   complexity    of   this    phase   is: 

(M-1)*  (Ct/Cb)  *Ti  +  M*Ct  {Th+[Ti  +  Cb  (Tb  +  Tc+f*Tm)  ]}  . 
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The  total  time  complexity  of  this  alternative 
for  a  backend  is: 

Cs  (Th  +  TH-2Ti)  +2  (M-1)  *  (Ct/Cb)  *Ti 
+  M*Ct  {Th+[  Ti+Cb  (Tb+Tc+f  *Tm)  ]}  . 

Now,  we  analyze  the  other  alternative  which  uses 

two  separate  hashing  tables- 
Phase   1:   The   source   records   and  the   target 

records  will  be  hashed,  grouped  into  the  buckets  of  separate 

hashing  tables  and   then  placed  onto  the   secondary  storage. 

The  time   complexity  for  each   backend  to  process   its  local 

records  is: 

(Cs  +  Ct)  *  (Th+Tb+2Ti)  . 

Upon  receiving  the  target  records  from  the  other 
tackends,  each  backend  will  insert  those  incoming  records 
into  the  hashing  table  of  the  target  records  and  stored  them 
back  to  the  secondary  storage.  Since  those  non-local  target 
records  are  grouped  and  sent  by  their  bucket  numbers,  the 
insertion  time  is  so  quick  that  it  may  be  ignored.  By  using 
an  inverted  list,  the  time  complexity  for  each  backend  to 
return  those  incoming  target  records  to  the  secondary 
storage  is: 

(M-1)*  (Ct/Cb)  *Ti. 

Phase  2:  Records  of  these  two  hashing  tables 
will  be  processed  one  bucket  at  a  time.  For  any  bucket 
number  (i.e.,  a  table  entry),  if  the  buckets  of  both  hashing 
tables  are  not  empty,  then  all  blocks  of  the  records  of  both 
buckets  will  be  read  into  the  primary  memory  for  the  merging 
operation.  It  takes  Ti  for  bringing  one  bucket  of  scurce 
records  (in  this  case,  one  block)  into  the  primary  memory 
and  M*Ti  for  one  bucket  of  target  records  (M  blocks).  The 
time   complexity   for  accessing,    comparing  and   possibly 
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merging  one  bucket  cf  source  records  with  one  bucket  (M 
blocks)  of  target  records  {not  including  the  disk  I/O  time) 
will  be: 

Cb*[Tb+M*Cb*(Tb+Tc+f*Tm)  ]. 

The  expected  time  complexity  for  all  buckets  will  be: 

(Cs/Cb) *Cb*[Tb+M*Cb*(Tb+Tc+f*Tm)  ] 

Therefore,   the   total  time  complexity  for   this  alternative 
is: 

(Cs+Ct)  (Th+Tb+2Ti)  +  (M-1)  *  (Ct/Cb)  *Ti 
+  (Cs/Cb) *Cb*[Tb+M*Cb* (Tb+Tc+f *Tm)  ] 


One  Common  Table 


Two  Separate  Table 


Th 


Cs+M*Ct 


(Cs+Ct) 
(M+2) *Cs+Ct 


Tb 


Cs+Ct *M*Ct 


Tc 


K*Ct*Cb 


Cs*M*Cb 


I  Ti    2Cs+M*Ct+2(M-  1)*  (Ct/Cb)  |    (Cs+Ct) +  (M-1) *Ct/Cb 
1     !  I 


j  Tm 


M*Ct*Cb*f 


Cs*M*Cb*f 


Figure  3.3    The  Time  Complexities  of  the 
Bucket-Hashing  Implementations. 


A  summary  of  the  time  complexity  in  terms  of  Th, 
Ti,  Tb,  and  Tc  for  these  two  subalternatives  is  shewn  in 
Figure  3.3.  As  shown  in  Figure  3.3,  alternative  which  uses 
two   separate  tables   is   better  than   the   other  one   which 
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employs  only  one  table.  Since  Cb  and  M  are  constants,  f  is 
smaller  than  1  and  Ct  may  be  equal  to  Cs,  we  can  further 
simplify  the  the  time  complexity  of  the  two-separate-tables 
subalternative  to  be: 

0(Cs+Ct)  or 

0  (Cs)  . 

d.   The  Conclusion  for  Our  Implementation  Approach 

A  summary  of  the  analysis  for  those 
implementation  approaches  in  terms  of  time  complexity  are 
shown  in  Figure  3.4.  Clearly,  the  one  based  on 
Bucket-Hashing  with  two  separate  hashing  tables  is  the  best 
approach.  Therefore,  our  implementation  will  be  based  on 
that  approach.  The  details  of  design  and  implementation 
will  be  discussed  in  the  next  chapter. 


Straightforward 
Sorting- Ma tching 
Bucket  Hashing 


0  (Cs*) 


Not  considered 


0  [Cs) 


Figure  3.4   Time  Cciplexity  of  Different  Iaplenentation. 
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IV.     DETAILED    DESIGN    FOR    IMPLEMENTING.    EETRIEVE-CCMjiON 

CPEBATION    INTO    MBDS 

In  the  previous  chapter,  a  bucket-hashing  based 
implementation  approach  has  teen  selected  for  implementing 
the  retrieve-common  operation  into  M3DS.  In  this  chapter,  we 
focus  on  specifying  the  details  of  that  approach  and  discuss 
any  of  the  existing  MBDS  software  which  will  be  affected  by 
this      ittplementa tion .  Our      primary  goal      is      to      use      the 

existing  software  as  much  as  possible  and  to  minimize  the 
effects    which   may    be   caused    by    the   implementation. 

The  operations  cf  the  retrieve-common  request  may  be 
described  in  four  phases.  First,  the  user's  request  must  be 
preprocessed  so  that  all  backends  can  be  informed  by  an 
appropriate      message.  This   is      the      request- preprocessing 

phase .  Second,  the  records  of  both  the  source  and  the 
target  record  sets  are  retrieved  before  the  merging 
operation.  This  is  the  record- retrieving  phase.  Third, 
those  retrieved  records  are  hashed  on  the  values  of  their 
join  attributes  and  stored  into  a  hashing  table  according  to 
their  hashed  values  (i.e.,  the  bucket  numbers).  We  recall 
that  there  are  two  hashing  tables,  one  for  the  source 
records  and  one  for  target  records.  Further,  the  hashed 
local  target  records  are  broadcasted  to  the  other  backends. 
This  is  the  hashing-and-storing  phase.  Lastly,  hashed 
records  of  source  tuckets  and  hashed  records  of  target 
tuckets        are        compared  and        merged         bucket-by-bucket, 

respectively.  The  merged  results  are  sent  to  the  controller 
from  all  of  the  backends.  This  is  the  merging  phase.  The 
controller    then    forwards    those    results    to    the   host   computer. 

The  operations  of  the  first  and  second  phases  can  be 
done         by        the        existing      system        software        with         minor 


45 


mo  ications.  nowever,  in  order  to  accomplish  the 
ope  ticns  of  the  last  two  phases,  we  mast  design  a  new  set 
of  procedures,  which  we  have  referred  to  as  the  hashing 
module.  In  the  remainder  of  this  chapter,  we  first  describe 
the  hashing  module,  and  then  the  operations  of  those  four 
phases. 

A.   TEE  EASHIHG  MODULE 

This  module  is  designed  to  accomplish  the  operations  of 
the  last  two  phases  cf  the  retrieve-common  request.  There 
are  three  procedures  within  this  module.  They  are:  the 
hashing  procedure,  the  bucJcet-block  tracking  procedure  and 
the  merging  procedure.  In  this  section,  we  first  discuss 
the  two  different  alternatives  for  implementing  this 
module.  After  choosing  the  better  alternative,  we  then- 
describe  the  three  procedures  of  the  hashing  module. 

1  -   Alternatives  for  Implementing  the  Hashing  Module 

There  are  two  alternatives  that  may  be  used  for 
implementing  the  hashing  module.  In  the  first  alternative, 
the  hashing  module  is  implemented  as  a  separate  process  of 
the  backend.  This  alternatives  modifies  the  existing 
process  structure  of  a  backend  by  introducing  a  sixth 
process  and  its  associated  communication  paths  into  each 
hackend.  In  the  seccnd  alternative,  the  hashing  module  is 
implemented  as  part  of  the  existing  record  processing 
process  (EECP) .  This  alternative  leaves  the  existing  backend 
process  structure  unchanged. 

a.   As  a  Separate  Process 

In  this  alternative,  the  hashing  module  is 
designed  as  a  separate  process  of  the  backend.  The  inputs 
to  the  hashing  module  are  either   the  local  source  or  target 
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records  from  the  local  REC?  or  the  other  target  records  from 
the  RECEs  of  the  other  backends.  The  outputs  from  the 
hashing  module  aue  the  merged  results,  which  are  sent  to  the 
controller.  The  transfer  of  records  between  processes 
(i.e.,  non-local  target  records  from  "Put  Pel"  to  the 
hashing  module  or  the  local  source  records  or  the  local 
target  records  from  the  local  RECP  to  the  hashing  module)  is 
accomplished  using  the  interprocess  message  capabilities  of 
each  Lackend.  The  new  process  structure  of  each  tat-kend 
with  the  additional  cemmunicat ion  paths  is  shown  as  Fig  4.1. 
Since  the  hashing  modulo  is  an  independent  process,  the 
effects  of  this  implemen ta t icn  on  the  other  processes  of 
MEDS  may  be  minimized. 
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Figure  4. 1    Hashing  Module  As  a  Separate  Process. 
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t.   As  a  Procedure  within  Record  Processing 

In  this  alternative,  the  hashing  module  is 
designed  as  a  group  cf  procedures  that  are  added  to  FECP. 
In  Figure  4.2  we  show  the  stiucture  of  the  hashing  module 
with  RECF.  The  local  records  (both  the  source  records  and 
the  target  records)  are  retrieved  by  the  physical  lata 
operation  of  RECP  of  each  lackend-  Once  the  records  are 
retrieved,  they  are  sent  to  the  hashing  module.  Ihe 
non-lccal  target  records  are  received  by  B2CP  from  the  ether 
tackends  and  then  passed  to  the  hashinj  module.  Ihe  merged 
results  are  then  sent  to  the  controller.  With  modularize! 
programming,  the  hashing  module  may  be  independently 
implemented  with  a  ainiwal  effect  on  the  original  RECP 
software. 
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Figure  4.2    Hasing  Hodule  as  Part  of  RECP. 
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c.       Comparison   of    These    Two    Alternatives 

Both  alternatives  can  be  easily  implemented  with 
minimal  effect  on  the  existing  system.  The  difference 
between  these  two  alternatives  is  the  way  that  the  local 
records  are  passed  frcm  the  "physical  data  operation"  to  the 
hashing  module.  In  alternative  (a) ,  the  records  are  passed 
as  an  interprocess  message.  In  alternative  (b) ,  the  records 
are  passed  as  a  paraneter  of  a  procedure  call.  We  choose 
alternative    (b)     for    three   reasons. 

(1)  The  message-passing  between  two  processes  within  a 
backend  is  slower  than  the  parameter-passing.  In 
message-passing,  both  processes  have  to  access  a 
common  memory  to  put  (or  get)  message.  The  accessing 
time  coupled  with  the  time  required  to  place  a 
message  in  the  common  memory  by  the  sender  and  fetch 
the  message  frcm  the  common  memory  by  the  receiver  is 
considerable.  In  parameter-passing,  only  the  logical 
address  of  the  record  buffer  is  passed  between  the 
procedures,    which   is   much    simpler   and    faster. 

(2)  Even  if  message-passing  within  a  computer  is  extremely 
fast,  there  is  a  large  number  of  messages  (i.e., 
records)  which  is  considerable.  Since  it  amounts  to 
route    the    messages    (records)     between    two   processes. 

(3)  The  extra  communication  paths  required  by  alternative 
(a)  (i.e.,  the  communication  paths  among  the  hashing 
module  and  the  other  MBDS  processes) ,  increase  the 
number  of  messages  passed  within  a  backend  and  among 
backends.  By  increasing  the  inter-backend  and 
intra-backend  communication,  we  nay  adversely  effect 
th€   overall    pe if ormance    of    a   backend. 
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2 .   The  Hashing  Procedure 

This  procedure  is  used  to  perform  the  hashing 
operation  on  the  values  of  the  join  attributes  of  the  input 
records.  The  inputs  to  the  procedure  are  either  the  local 
source  records  or  the  local  target  records,  which  are 
received  from  the  physical-data-operation  subprocess  of 
RECP.  The  output  frcm  the  procedure  are  the  input  records 
and  their  hashed  values  (i.e.,  the  bucket  numbers),  which 
are  sent  to  the  bucket-block  tracking  procedure  with  the 
request  id  for  further  processing. 

The  hashing  operation  is  done  by  the  hashing 
functions  of  this  procedure.  Since  the  type  of  the  values 
of  the  join  attributes  may  either  be  an  integer  or  a 
character  string,  we  have  designed  two  hashing  functions  in 
this  procedure.  Generally,  a  good  hashing  function  should 
satisf j  the  following  three  requirements: 

(1)  All  of  the   records  should  be  evenly   distributed  into 
buckets  of  the  hashing  table; 

(2)  The  chance  of  hashing  different   records  into  the  same 
bucket  should  be  minimized;  and 

(3)  The  hashing  computation  should  be  fast. 

These  requirements  are  closely  related  to  the  number  of 
buckets  and  the  hashing  algorithm  which  is  used  in  the 
hashing  function. 

a.   The  Number  of  the  Euckets 

A  hashing  table  with  a  large  number  of  buckets 
is  useful  for  a  number  of  reasons.  First,  the  large  number 
of  buckets  may  reduce  the  chance  of  hashing  different 
records  into  the  same  buckets.  Second,  the  number  of 
records  in  each  bucket  is  also  quite  small,  and  this  will 
reduce  the  access  time  during  merging.   However,  it  would  be 
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impractical  to  have  a  table  with  a  very  large  number  of 
bucket  entries,  where  each  bucket  would  only  contain  a  few 
records.  When  the  table  becomes  exceedingly  large,  a 
substantial  cost  is  incurred  to  maintain  the  bucket  index. 
The  bucket  index  of  a  hashing  table  is  an  array  of 
fixed-size  bucket  entries.  There  is  a  bucket  entry  for  each 
bucket  to  keep  track  cf  the  records  which  are  stored  in  that 
bucket.  Therefore,  the  number  of  buckets  (and  therefore  the 
tucket  entries)  can  be  computed  by  the  following  equation: 

let  X  be  the  size  of  the  bucket  index  (measured  in  bytes) , 
Y  be  the  size  of  a  bucket  entry  (measured  in  bytes) , 
then   the  number  cf  buckets  is  (X  /  Y) . 

For  example,  if  the  size  of  bucket  index  of  a  hashing  table 
is  8K  bytes  and  the  size  of  each  bucket  entry  is  8  bytes 
then  the  number  of  bucket  entries  for  that  hashing  table  is 
1k,  i.e.,  1024. 

How  should  we  '  determine  the  size  of  the  bucket 
index  cf  our  hashing  table?  Since  MBDS  allows  the 
concurrent  execution  cf  different  user  transactions,  there 
may  be  a  number  of  retrieve-ccmmon  requests  being  processed 
by  the  system.  Each  of  the  retrieve-common  requests 
requires  two  hashing  tables,  one  table  for  the  source  record 
set  and  one  table  for  the  target  record  set.  Because  of  the 
potentially  large  number  of  hashing  tables  concurrently  in 
use,  it  will  be  necessary  to  store  the  bucket  indexes  cf  the 
tables  in  the  secondary  storage  and  stage  them  into  the 
primary  memory  on  demand.  To  minimize  and  optimize  the  size 
of  the  bucket  index  of  the  hashing  table,  it  is  desirable  to 
have  the  size  of  the  tucket  index  as  a  multiple  of  the  unit 
of  disk  I/O  transfer.  For  example,  if  the  unit  of  disk  I/O 
transfer  (which  is  typical  the  track  size)  is  4K  bytes,  then 
the  size  of  the  bucket  index  shall  be  M*4K  bytes,  where  M  = 
{1,  2,    3,  ...}.    In  cur  case,  we  choose  16K  bytes  to  be  the 
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size  of  our  hashing  table,  yielding  2048  entries  (therefore, 
2048  tuckets)  in  the  hashing  table  each  with  a  bucket  entry 
size    of    8    tytes. 

t.      The    Hashing    Algorithm 

Since  the  value  type  of  the  join  attribute  may 
be  either  an  integer  cr  a  character  stnnj,  we  have  designed 
two    hashing   functions,    one    for    each    value    type. 

(1)       The         Hashing  Algorithm  for  the 

Integer-Valued  A 1 1 r  i b u t e s .  In  ol  c  to  evenly  distribure 
the  values  of  all  join  attribute.  into  the  buckets  and  to 
minimize  the  collisions;  we  use  le  information  about  the 
maximum  and  minimum  values  of  the  join  attributes.  These 
information      is    maintained      in    the      record    templates.  The 

hashing  algorithm  for  the  integer  attribute  value  is 
described    as    follows. 

Step    1:    Get    the    MAX    (maximum)    and    MIN     (minimum)        values   of 
the    join    attribute    from    the   rtcord    template.     let 
X  =    The_nuirber_of    buckets_in_ha shin j_table 
Step    2:    If    MAX-MIN    <    X 

then      go    to    step    4 
else      Tempi    =    (MAX   -    MIN)     Div    X 
Step    3:    Get    the    input    record    and   let 

Y  =  The_value_of_the_join_at tribute 
bucket_numter  =  (Y  -  MIN)  Div  Tempi 
go   to   step   5 

Step    4:    Get    the    input    record    and    let 

Y  =  The_value_of_the_join_at tribute 
bucket_numter   =    Y    -    MIN 

Step    5:    Return    the    tucket    number    to    the    calling    procedure. 


(2) 


The 


Hashing 


Aljor i thm 


the 


Character- Valued    Attributes.       The    record    template      does      not 
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The  record  template  does  not  provide  the  maximum  and  the 
minimum  values  for  the  character- valued  attributes  as  it 
does  for  integer- valued  attritutes-  In  order  to  minimize 
collisions  and  distribute  records  evenly  into  buckets,  we 
design  a  lookup  table,  which  is  an  array  with  2043 
character-string  elements,  to  perform  the  hashing  function. 
The  number  of  the  elements  is  egual  to  the  number  of  the 
entries      in   the      bucket    index      of    the      hashing   table.  The 

values  of  the  join  attributes  of  the  input  records  are 
searched  against  the  contents  of  the  lookup  table  to  obtain 
the  bucket  values.  The  binary  search  algorithm  is  used  to 
minimize   the  searching   time   of    the   lookup    table. 

The  contents    of    the      entries   of    the    lookup 
table   are   created    in   the   following   way: 

(1)  Get  a    English    dictionary    with    more   than   2049    pages; 

(2)  Divide  the  pace  number  by  the  number  of  the  buckets 
(in   our   case   the    number    is    2048) ; 

(3)  let  the  result  be  x. y ,  where  the  x  and  y  are  positive 
decimal   digits; 

(4)  Pick  up  the  last  word  of  every  x.y  page  from  the 
dictionary  and  place  the  first  four  characters  as  an 
entry   in    the    lookup  table;    and 

(5)  If  the  length  of  the  selected  word  is  less  than  4, 
fill    the    word    with   trailing   blanks. 

We  use  only  the  first  four  characters  to  compare  the  values 
cf  join  attributes  for  two  reasons.  First,  we  believe  that 
there  are  very  few  English  words  that  will  have  the  same 
first      four      letters.  Second,         we      want      to      reduce      the 

primary-memory    requirements    for   the    lookup   table. 

The     algorithm      for      the      character-valued 
attritutes    is   as    follows. 

Step    1:    Let   HIB    =    0   and   MAX   =    2047. 

Step    2:    Get   the   input   record      and   let 

X   =    The_value_of_the_join_attribute; 

Step    3:    If   X    >    look_up_table[MAX    ] 
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then 

bucket_number  =  MAX,  go  to  step  6. 
Step  4:  Use  binary  search  to  find  the  bucket  number. 
Step  5:  Return  the  tucket  number  to  the  calling  procedure. 

3 .   The  Bucket-Blcck  Tracking  Procedure 

The  input  to  this  procedure  may  be  either  the  local 
records  (either  the  source  records  or  the  target  records) 
with  their  bucket  numbers  from  the  hashing  procedure  or  the 
non-local  target  records  grouped  by  their  bucket  values  from 
the  other  backends.  The  outputs  from  the  procedure  are  the 
logical  addresses  of  the  hashing  tables  of  the  source 
reguest  and  the  target  reguest,  which  are  sent  to  the 
merging  procedure  for  the  merging  operation.  The 
bucket-block  tracking  procedure  performs  three  functions: 

(1)  maintaining  a  global  table  to  keep  track  of  the 
logical  addresses  of  the  hashing  tables  for  all 
retrieve-common  requests  which  are  currently  being 
processed  in  the  system; 

(2)  maintaining  a  hashing  table  for  the  current  reguest 
and  keeps  track  of  all  of  the  buckets  and  blocks  of 
that  hashing  table;  and 

(3)  storing  the  input  records  into  appropriate  buckets  and 
blocks  according  to  their  bucket  values. 

In  order  to  provide  a  better  understanding  of  this 
procedure,  we  first  introduce  the  structures  of  the  blocks, 
the  buckets,  the  hashing  table  and  the  global  table.  We  then 
discuss  how  these  functions  are  accomplished. 

a.   The  Structure  of  a  Block 

Each  block  is  divided  into  two  parts:  the  header 
and  the  body.  The  header  has  two  fields.  The  first  field  is 
used  to  record  the  length  (in  bytes)  of  the  body,  i.e.,   all 
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of  the  records  in  bytes  currently  stored  in  this  block.  The 
second  field  is  used  to  store  the  logical  address  of  the 
next  block  whose  records  have  the  same  tucket  value  as  this 
block.  If  there  is  no  other  block  of  the  bucket,  then  there 
is  a  null  address  in  this  field.  The  body,  is  used  to  store 
the  hashed  records  and  their  common  attribute  values. 
Blocks  which  are  in  the  same  bucket  are  maintained  as  an 
inverted  list  and  tracked  by  their  logical  addresses.  The 
structures  of  the  block  and  its  header  are  shown  in  Figure 
U.3. 
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B-   The  Structure  of  Block  Header 
Figure  1.3    The  structures  of  Block  and  Its  Header. 


h.   The  Structure  of  a  Bucket 

As  mentioned  in  chapter  II,  instead  of  using 
primary  and  overflow  areas,  each  bucket  uses  fixed-size 
blocks  to  store  records.  The  number  of  blocks  per  bucket 
may  vary  among  different  buckets.  The  bucket  entry  is  used 
to  indicate  the  status  and  to  keep  track  of  the  blocks  of 
that  tucket. 
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Each  bucket  entry  in  the  backet  index  has  two 
parts:  the  status  and  the  logical  address  of  the  block 
currently  being  used.  The  status  is  used  to  indicate 
whether  cr  not  the  bucket  is  empty.  The  size  of  the  bucket 
entry  is  8  bytes,  where  2  bytes  are  used  for  the  status  and 
6  bytes  are  used  for  the  logical  address  which  is 
represented  by  a  tuple  consisting  of  the  logical  disk 
number,  the  logical  cylinder  number  and  the  logical  track 
number.  The  structure  of  a  bucket  is  shown  in  Figure  4. 4. 
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Figure  4.4    The  Structure  of  a  Bucket-entry, 


c.  The  Structure  of  the  Hashing  Table 

A  hashing  table  is  an  array  of  bucket  entries. 
Ke  anticipate  that  the  retrieve-common  operation  will  be 
implemented  on  a  SUN  Workstation  running  the  UNIX  operating 
system,  with  a  16K  unit  of  disk  I/O.  Using  the  equation 
from  the  previous  subsection,  we  can  compute  the  number  of 
bucket  entries  for  our  hashing  table  to  be  2048. 

d.  The  Global  Table 

Since  MBDS  allows  concurrent  processing  during 
the  retrieval  operation,  there  may  be  several 
retrieve-common  reguests  in  the  system.  We  need  a  table 
that  keeps   track  of   all  of   the  logical   addresses  of   the 
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hashing  tables  for  each  retrieve-common  request.  Each  entry 
of  the  global  table  contains  two  parts:  the  request  id  of 
the  request  and  the  logical  address  of  the  hashing  table  for 
that  request.  The  request  id  consists  of  the  traffic  id, 
which  is  the  unique  identifier  of  a  traffic  unit  [Ref-  11  : 
p.  41]#  and  the  request  number  which  indicates  the  sequence 
cf  the  request  in  the  traffic  unit.  Each  entry  of  the 
global  table  is  created  whenever  a  new  hashing  table  is 
created,  and  deleted  when  that  request  has  been  completed 
processing.  The  structure  of  the  global  table  is  shown  in 
Figure  4.5. 
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Figure  4.5    The  Structure  of  the  Global  Table, 


e.   The    Seguence    of   the    Operations    of 
Bucket-block  Tracking  Procedure 


the 


The   steps  of   the   sequence   to  accomplish   the 
operations  of  this  procedure  are  described  as  follows. 
Step  1:  Create  and  initialize  the  global  table. 
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Step  2:  Check  the  request  ID  of  the  input  records  with  the 
global  table  to  see  if  the  input  records  belong  to 
a  new  request.  If  they  do,  then  allocate  a  hashing 
table  for  that  request,  initialize  the  bucket 
index  and  store  the  logical  address  of  the  hashing 
table  into  the  global  table.  Otherwise,  get  the 
existing  hashing  table  into  the  primary  memory 
using  the  logical  address  information  provided  by 
the  global  table. 

Step  3:  Extract  a  record  from  the  input  buffer.  If  the 
record  is  the  first  record  of  that  request,  then 
go  to  step  10. 

Step  4:  If  the  bucket  value  of  this  record  is  the  same  as 
the  previous  one,  then  go  to  step  8. 

Step  5:  Store  the  block  which  contains  the  previous  record 
back  to  the  secondary  storage. 

Step  6:  Get  the  desired  bucket  entry  (table  entry)  for  the 
record  by  its  hashed  bucket-value.  Check  the 
status  of  the  bucket.  If  it  is  "empty",  then  go 
to  step  1 1 . 

Step  7:  Get  the  currently  used  block  by  its  logical 
address  in  the  bucket  entry. 

Step  8:  If  there  is  space  in  the  block  that  is  available 
for  storing  this  record,  then  go  to  step  12. 

Step  9:  Get  a  new  block,  put  the  current  logical  address 
of  the  bucket  entry  into  the  "logical  address  of 
next  block"  field  of  the  block  header.  Then, 
update  the  bucket  entry  with  the  logical  address 
of  this  new  block.   Goto  step  12. 

Step  10:Get  the  desired  tucket  entry  by  its  hashed 
hucket-value,  update  the  status  of  that  bucket 
entry  to  "ret  empty". 

Step  11:Get  a  new  block  and  put  its  logical  address  into 
the  bucket  entry. 
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Step  12: Store  the  record  into  the  block  and  update  the 
"length  of  record"  field  of  the  block  header. 

Step  13:Eepeat  the  steps  3  to  12  until  all  records  have 
teen  processed. 

Notice  that  the  block  is  not  immediately 
returned  to  the  secondary  storage  after  the  insertion  of  one 
input  record.  Since  the  records  in  MBDS  are  stored  by 
clusters,  it  is  very  likely  that  records  within  the  same 
cluster  will  be  retrieved  again.  Therefore,  by  keeping  the 
current  block  in  the  primary  memory,  we  may  save  one  store 
and  one  read  operations  if  the  next  input  record  is 
retrieved  from  the  same  cluster  and  hashed  into  the  same 
bucket  (that  is,  they  may  have  the  same  bucket  value). 

1  •   She  H§_rc[in£  Procedure 

This  procedure  is  used  to  perform  the  merging 
operation.  The  inputs  to  this  procedure  are  the  logical 
addresses  of  the  hashing  tables  of  the  source  request  and 
the  target  request,  which  come  from  the  bucket-block 
tracking  procedure.  The  outputs  from  this  procedure  are  the 
merged  results,  which  are  sent  to  the  controller. 

The  algorithm  of  the  merging  procedure  is  as 
follows. 

Step  1:  Reserve  a  result  buffer. 

Step  2:  Get  the  hashing   tables  of  the  source   request  and 

the  target  request  by  their  logical  addresses. 
Step  3:  Compare  the  bucket  statuses  of  these  two  hashing 
tables  bucket  by  bucket.  If  both  buckets  contain 
records  fcr  a  particular  bucket  number,  then 
retrieve  all  the  records  associated  -with  this 
particular  tucket  value  from  both  tables. 
Step  4:  Apply  the  straightforward  merging  algorithm  on 
those  retrieved  records.  Insert  merged  results 
into  the  result  buffer. 
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Step  5:  If   the  result   buffer   is   full,   then   send   its 

contents  to  the  controller. 
Step  6:  Repeat  steps  3,  4  and  5  until  all  the  buckets  have 

been  processed. 
Step  7:  Free  the  result  buffer. 

B.   TEE  OPERATIONS  OF  THE  FOUR  PHASES 

In  this  section  we  discuss  the  operations  of  each  phase 
of  the  retrieve-common  request  and  the  software  which  will 
be  affected  by  those  operations. 

1 .   The  Request- preprocessing  Phase 

a.  The  Operations 

The  operations  of  this  phase  include  parsing  the 
user's  transaction  (or  request)  and  if  the  transaction 
(request)  is  correctly  parsed,  then  the  controller  will 
compose  an  appropriate  message  to  inform  the  tackends  to 
tegin  execution  for  the  request.  Since  the  retrieve-common 
request  is  conceptualized  and  executed  as  two  retrieval 
operations,  the  parser  has  to  parse  the  user's  request  and 
transform  the  request  from  the  form  of  a  single  request  to  a 
form    cf    a   transaction  with    two   requests. 

b.  The  Affected  Software 

Basically,  operations  of  this  phase  can  be  done 
by  the  existing  Request  Preparation  process.  However,  the 
software  for  this  process  must  be  modified  as  follows: 

(1)  The  parser  should  be  able  to  recognize  the  newly  added 
syntax  and  correctly  parse  the  request; 

(2)  The  composer  should  be  able  to  form  a  new  message  to 
inform  PP  and  all  of  the  backends  so  that  they  can 
perform  the  desired  operation; 
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(3)  New  message  types  are  added  for  processing  the 
retrieve-common   request;    and 

(4)  PP  and  all  of  the  backends  should  be  able  to  recognize 
and  process  the  new  created  message  for  the 
retrieve-common   request. 

2  .      The   Record -retrieving    Phase 

a.       The    Operations 

Operations  of  this  phasa  include  the  address 
generation  and  the  record  retrieval  for  both  the  source 
request  and  the  target  request.  These  two  requests  will  be 
processed  by  DM  as  the  other  four  different  types  of 
requests.  As  mentioned  in  previous  chapter,  the  target 
records  are  processed  after  the  source  records.  In  crder  to 
separate  the  records  of  these  two  requests,  DM  will  first 
send  the  source  reguest  and  its  associated  address  set  to 
RECP,  and  hold  the  target  request  and  its  addresses  set 
until  receiving  a  message  from  RECP  indicating  that  ail 
source    records   have    been   retrieved. 

The  record-retrieving  operation  is  performed  by 
the  physical-data-operation  sufcprocess  in  RECP  as  a  regular 
retrieve  request.  Instead  of  sending  the  retrieved  records 
to  the  controller,  control  logic  is  used  to  route  them  to 
the    hashing   module    fcr   hashing   and   subsequent  merging. 

t.      The    Affected    Software 

Most  of  the  operations  of  this  phase  are  done  by 
DM,  CC  and  the  Physical  Data  Operation  of  RECP  in  each 
backend.      The  affected   software   includes: 

(1)  We  need  to  add  control  logic  into  DM  so  that  the 
address  information  of  the  source  and  target  request 
will    not    be   sent    to   RECP    together;    and 
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(2)  We  need  to  add  a  new  procedure  to  handle  the 
retrieve-common  request  and  control  logic  to  route 
the  results  to  the  hashing  module  instead  to  PP. 

3  .   The  Hashing- and- storing   Phase 

This  is  the  most  important  part  of  the 
retrieve-common  request.  All  of  the  records  are  prepared  in 
this  phase,  so  they  can  be  merged  on  next  phase.  The 
operations  of  the  hashing-store  phase  includes: 

(1)  performing   hashing  operations  on  the  local  records, 

(2)  table  maintenance  and  bucket-block  tracking 
operations,  and 

(3)  broadcasting  (and  receiving)  the  target  records  and 
their  bucket-values  to  (from)  the  other  backends. 

a.  The  Hashing  Operations 

This  operation  is  performed  by  the  hashing 
procedure  of  the  hashing  module.  Upon  receiving  the  local 
records  from  the  previous  phase,  the  hashing  procedure  will 
check  the  record  tenplate  to  get  the  value  type  of  the 
common  attribute  values  and  then  apply  an  appropriate 
hashing  function  to  hash  the  common  attribute  values.  The 
records  and  their  hashed  bucket-values  will  then  be  passed 
to  the  bucket-block  tracking  procedure  for  further 
processing. 

b.  Table-maintenance      and  Sucket-block      Tracking 
Operation 

This  operation  is  done  by  the  bucket- tlock 
tracking  procedure.  A  global  table  is  maintained  to  store 
the  address  of  all  of  the  hashing  tables  for  all  of  the 
different   retrieve-common   requests   which   are   currently    being 
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processed   by      the   system.  Whenever  a      new   retrieve-common 

request  is  encountered,  the  bucket-block  tracking  procedure 
will  create  a  new  hashing  table  for  that  request.  The 
logical  address  of  the  newly  created  hashing  table  is  then 
stored    into      the    global    table.  The   hashing   table      will   be 

deleted  when  the  reguest  is  complete.  Records  are  stored 
into      buckets      according      to        their      hashed     values.  The 

information  of  the  backet  entries  and  the  block  headers  are 
maintained  and  updated  by  the  bucket-block  tracking 
procedure  as  described  in  the   previous    section. 

c.  Broadcasting        And      Receiving         Target        Records 
Between    Eackends 

After  the  local  target  records  has  been  hashed 
and  processed,  each  backend  will  buffer  its  local  target 
records  (retrieved  frcm  the  target- hashing  table  with  their 
bucket    values)  and   broadcast    them      to   the      other    tackends. 

Upon  receiving  those  non-local  target  records,  each  backend 
will  store  them  intc  the  target-hashing  table  by  their 
bucket  values.  A  checklist  is  used  to  ensure  that  the 
target  information  ficm  all  of  the  other  backends  has  been 
received. 

d.  The    Affected    Software 

Since  the  operations  of  this  phase  are  dore  by 
the  hashing  module;  RECP  is  affected  to  the  extent  that  this 
module  is  integrated  into  the  RECP  process.  No  ether 
existing  software  will  be  affected- 

U.   The  Merging  Phase 

This  is  the  last  phase  of  the  retrieve-common 
operation.  The  local  source  records  and  the  entire  set  of 
target  records  are  cenpared  and  merged. 
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a.  The  Operation 

The  operations  are  performed  by  the  merging 
procedure  of  the  hashing  module.  Because  the  records  of 
both  tatles  are  unscrted,  they  are  merged  by  using  the 
straightforward  algorithm.  The  merged  results  are  stored  in 
a  result  buffer  and  then  sent  to  the  controller. 

b.  The    Affected    Software 

Since  this  phase  is  also  done  by  the  hashing 
module;  RECP  is  affected  to  the  extent  that  this  module  is 
integrated  into  the  EICP  process.  No  other  existing  system 
software  is  affected. 
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V.  THE  IMPLEMENTATION 

In  this  chapter,  we  describe  how  the  retrieve-common 
request  is  integrated  into  the  MBDS  system.  To  successfully 
perform  the  integration,  it  is  necessary  to  modify  a  portion 
of  the  MBDS  software.  Therefore,  this  chapter  alsc  on 
discussing  how  the  MBDS  software  is  modified  for  the 
integration  and  implement  ation  of  the  ret  rieve-ccmiron 
oper  ation. 

In  the  remainder  of  this  chapter  we  first  describe  the 
modified  processes  of  the  controller.  Second,  we  describe 
the  modified  processes  of  each  backend.  Then,  we  present 
the  modified  M3DS  message-passing  facilities.  Finally,  we 
trace  the  execution  sequence  of  the  retrieve-common  reguest 
in  terms  of  the  types  of  messages  that  are  passed  among  the 
MBDS  processes. 

A.   THE  MODIFIED  PROCESSES  OF  THE  CONTROLLER 

1  .   The  Request  Preparation  Process  (REO.P) 

There  are  twc  subprocesses  in  REQP,  namely  the 
parser  and  the  composer.  The  parser  parses  the  reguests  an  I 
checks  for  syntax  errors.  The  composer  transforms  the 
correctly  parsed  reguests  into  the  form  required  for 
processing  at  the  backends. 

a.   The  Parser 

The  parser  does  both  the  lexical  and  the 
syntactical  analyses  cf  the  AEDL  transaction  (or  requests)  . 
The  input  to  the  parser  is  either  a  reguest  or  a 
transaction.  The  outputs  from  the  parser  are  the  error 
messages  to  the  test  interface,  the  aggregation  operators  to 
P?  and  the  correctly  parsed  requests  to  the  composer. 
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The  lexical  analysis  is  done  by  the  lexical 
analyzer  produced  by  1EX  [Ref.  11  :  p.  42].  The  input  to 
LEX  is  a  specification  of  the  tokens  of  the  language  (i. e. , 
the  tokens  of  ABBL)  in  the  form  of  regular  expressions  an  3  a 
set  of  subroutines  vihich  specify  the  actions  to  be  taken 
upon  recognition  of  the  tokens.  The  syntactical  analyzer  is 
generated  by  YACC  (Yet  Another  Compiler  Compiler)  [Bef.  12]. 
The  input  to  YACC  is  a  specification  which  includes  the 
declarations  of  tokens*  names,  the  rewriting  rules  of  the 
grammar,  and  the  action  program.  YACC  produces  a  C  program 
to  determine  whether  the  input  ABDL  transactions  (requests) 
are  syntactic  illy  correct. 

For  the  parser  tc  correctly  parse  the  users' 
retrieve-common  requests,  we  have  made  several  modifications 
to  the  original  parser  subprocess.  These  modifications  are 
listed  below. 

(1)  Regular  expressions  for  the  LEX. 

we  have  added  a  new  set  of  regular  expressions  so 
that  the  lexical  analyzer  can  recognize  the 
retrieve-common  request  and  generate  appropriate 
tokens  which  in  turn  can  be  recognized  and  used  by 
YACC. 

(2)  Grammar  rules  fcr  YACC. 

A  new  set  of  rules  has  been  added  into  the  original 
ABDL  grammar  sc  that  the  parser  can  recognize  those 
tokens  which  are  generated  for  retrieve-common  reguest 
and  organize  those  tokens  by  these  newly  created 
rules. 

(3)  The   request    type. 

we  have  added  a  new  request  type,  the  retrieve-common 
request,  so  that  the  parsed  transaction  can  be 
correctly  identified  and  properly  executed  by  the 
composer  and  the  other  processes  of  MBDS. 
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(4)  The  action  program. 

The  input  of  the  retrieve-common  request  to  the  parser 
is  in  the  form  cf  a  single  request.  The  parser  should 
re  able  to  parse  this  request  and  generate  a 
transaction  of  two  retrieval  requests  (each  of  the 
retrieve-common  request  type).  If  the  join  attribute 
is  not  in  the  target  list  (of  the  source  or  the  target 
request)  ,  the  action  program  inserts  the  join 
attribute  into  the  head  cf  the  target  list.  The  extra 
attribute- value  pairs  (i.e.,  the  join  attribute-value 
pairs)  of  the  retrieved  records,  which  are  going  to  be 
deleted  by  the  merging  procedure,  are  not  to  be  in  the 
results  so  that  the  merged  results  contains  only  the 
desired  attribute-value  pairs.  The  newly  added 
regular  expressions,  grammar  rules  and  the  SSL  for 
the  modified  action  program  are  provided  in  Appendix 
A. 

1.   The  Composer 

The  composer  receives  the  correctly  parsed 
requests  from  the  parser  and  formats  them  into  the  required 
message  format.  Then,  the  composer  broadcasts  the  formated 
messages  to  all  of  the  backends  for  execution.  We  have 
modified  the  original  composer  program  so  that  the  composer 
can  correctly  reformat  the  retrieve-common  request. 

2 •   The  Post  Processing  Process  (PP) 

The  post  processing  process  includes  the  aggregate 
post  operation  and  the  reply  monitor.  The  functions  of  PP 
are  described  in  [ Hef-  11  :  p.  27].  The  aggregation  post 
operation  is  not  modified.  The  only  modification  in  the 
reply  monitor  is  to  recognize  the  new  request  type  for  the 
retrieve-common  request. 
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B.   TEE  MODIFICATION  OF  THE  BACKEND  PROCESSES 

As  described  in  chapter  II,  one  of  the  design  issues  of 
MBDS  is  to  assign  as  much  work  as  possible  to  the  backends. 
Consequently,  there  are  more  changes  in  the  processes  of 
each  hackecd  than  changes  in  the  controller.  The  affected 
processes  are  directory  management  and  record  processing. 

1  -   The  Directory  Manage me n t  Process  (DM) 

DM  receives  the  new  transaction  message  for  the 
retrieve-common  request  from  the  request  composer  and  then 
performs  a  number  of  directory  operations,  which  includes 
attribute  search,  descriptor  search,  cluster  search,  address 
generation  and  directory  table  maintenance.  From  our 
earlier  discussion,  we  know  that  the  source  and  target 
request  for  a  retrieve-common  request  should  not  be 
processed  concurrently  by  RECP.  The  target  request  must  be 
held  in  DM  until  RECP  informs  DM  that  the  source  request  has 
finished  execution.  Therefore,  DM  will  first  process  the 
source  request  and  send  the  request  and  its  addresses  to 
RECP.  The  target  request  is  held  in  DM  until  RECP  notifies 
DM  that  the  source  request  is  done. 

At  what  stages  of  the  DM  processing  do  we  hold  the 
target  request?  There  are  several  alternatives  for  holding 
the  target  request  in  DM.  These  alternatives  are  list  below. 

(1)  Hold   the    target   request   without    performing   any 
directory  operation. 

(2)  Hold  the   target  request  after  it   completes  attribute 
search. 

(3)  Hold  the   target  request  after  it   completes  attribute 
search  and  descriptor  search. 

(4)  Hold  the   target  request  after  it   completes  attribute 
search,  descriptor  search  and  cluster  search. 
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(5)  Hold  the  target  request  after  it  completes  attribute 
search,  descriptor  search,  cluster  search,  and  address 
generation. 

Alternatives  2,  3,  4,  and  5  will  generate  status  and 
directory  information  for  the  target  request  which  must  be 
held  somewhere.  Due  to  the  large  number  of  the  possible 
attributes,  the  size  cf  the  status  and  directory  information 
may  be  too  big  to  be  kept  in  the  primary  memory,  i.e.,  they 
will  have  to  be  stored  back,  to  the  secondary  storage.  The 
extra  disk  I/O  time  for  moving  the  status  and  directory 
information  in  and  out  of  the  primary  memory,  not  only  slows 
the  retrieve-common  operation,  but  also  increases  the 
prograir  complexity  and  causes  many  unnecessary  changes  to 
the  existing  software.  Therefore,  we  choose  alternative  (1) 
to  process  the  target  request. 

The  algorithm  for  the  modified  DA   is  as  follows. 
Step  1:  Get  the   next  message  from   the  message   queue  and 

find  the  sender  of  the  message. 
Step  2:  If  the  sender  is  the   controller,   then  go  to  step 

5. 
Step  3:  If  the  sender  is  EECP,  then  go  to  step  8. 
Step  4:  If  the  sender  is  CC,    then  go  to  step  11. 
Step  5:  If  this  is  not  a  retrieve-common  transaction,  then 

go  to  step  11. 
Step  6:  Identify  and  separate  the  source  request   and  the 
target  request   from  the   transaction.    Hold   the 
target   request   and   perform   the    directory 
processing  on  the  source  request. 
Step  7:  Send  the   source  request  with   its  address   set  to 

EECP.   Go  to  step  1. 
Step  8;  If  this   is  not   the  message   which  indicates   the 
completion  of   retrieving  all  the   source  records, 
then  go  to  step  11. 
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Step  9:  Get  the  correspondent  target  request  and  perform 
directory    processing   on   that    target   request. 

Step  10:Send  the  target  reguest  with  its  address  set  to 
RECP. 

Step    11:Perform    the  original    DM   operation. 

The    SSL    for   the    modified   DM    is    provided    in    Appendix    E. 
2.      The   Eecord    Processing,    Process    (RZCP) 

EECP  receives  the  requests  and  their  address  sets 
from  DM  and  performs  the  physical  data  operations  on  those 
requests.  The  original  physical-data-operation  subprocess 
includes  a  control  function  and  a  subfunction  for  each  type 
of   reguest.  The   surf unctions   are      invoked   by      the   control 

function  according   to  the   type   of  request    being   processed. 

In  order  to  process  the  retrieve-common  request,  we 
have    made    two    modifications    to    RECP: 

(1)  adding  a  new  subfunction,  the  retrieve-ccnimon 
sulfunction,  into  the  physical-data-operation 
sutprocess;    and 

(2)  adding  a       new    subprocess,         the    hashing      module,      into 

RECP. 

a.      The    Retrieve-Common   Subfunction 

The  purpose  of  the  retrieve-common  subfunction 
is  to  direct  the  flow  of  the  control  in  the 
physical-data-operat icn  subprocess  so  that  the 

retrieve-common    reguest      can    be      processed   correctly.  The 

difference   between      the   retrieve-common    subfunction      and    the 
retrieve   subfunction   can   be    summarizel  as   follows. 

(1)  The  retrieve  subfunction  sends  the  retrieved  records 
to  the  PP ,  whereas  the  retrieve-common  subfunction 
sends    the    retrieved    records   to   the    hashing    module. 
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(2)  In  addition  to  sending  a  message  to  CC  to  indicate  the 
completion  of  the  retrieval  of  physical  data  (as  the 
retrieve  subf unction  does)  ,  the  retrieve-common 
sutfunction  will  send  a  message  to  notify  DM  that  all 
the  source  records  have  been  processed. 

The  algorithm  for  the  retrieve-common 
subfunction  is  as  follows. 

Step  1 :  Reserve  a  result  buffer- 
Step  2:  For  each  address  in  the  set  of  tracks  which  are 
furnished  by  DM,  fetch  the  track  from  the  disk 
and  place  it  in  the  track  buffer  in  the  primary 
memory . 

Step  3;  Examine  the  records  in  the  buffer  one-by-one.  If 
the  record  is  marked  for  deletion,  disregard  it. 
If  the  record  does  not  satisfy  the  query, 
disregard  it.  If  a  record  satisfies  the  query, 
then  extract  the  values  for  the  attribute  names  in 
the  target-list  of  the  request  and  store  this 
information  in  the  result  buffer. 

Step  4:  When  the  result  buffer  is  full,  send  the  contents 
of  the  buffer  to  the  hashing  module. 

Step  5:  Repeat  steps  2,  3  and  4  until  there  are  no  more 
addresses  for  the  reguest. 

Stej  6:  Send  a  message  to  CC  to  release  the  lock  for  this 
request.  If  this  is  a  source  request,  then  send  a 
message  to  DM  so  that  DM  can  process  the  target 
request. 

Step  7:  Free  the  result  buffer. 

The   SSL    for   the   modified   control   function    and   the 
retrieve-common  subfunction  are  provided  in  Appendix  C. 
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b.   The  Hashing  Module 

The  hashing  module  performs  the  hashing  and 
merge  operations.  Ihe  merged  results  are  sent  to  the 
controller.  The  module  is  invoked  by  the  retrieve-ccmmon 
sub-function  of  the  physical-data-operation  subprocess. 
There  are  three  procedures  within  this  irodule,  the  hashing 
procedure,  the  bucket-block  tracking  procedure  and  the 
merging    procedure. 

(1)       The        Hashing         Procedure.         The  hashing 

procedure  receives  the  records  from  the  retrieve-common 
subfuncticn  of  the  physical-data-operation  subprocess  and 
performs  the  hashing  function  on  the  value  of  the  join 
attribute    of      each    record.  The    records      and   their      hashed 

results  are  stored  in  a  result  buffer.  When  the  buffer  is 
full,  its  contents  are  passed  to  the  bucket-block  tracking 
procedure   for    further   processing. 

The    algorithm      for    the    hashing      procedure   is 
as    follows. 

Step    1  :    Beserve   a    result    buffer. 

Step   2:    Get      the      data      type      of   the      value     of      the      join 
attribute    from      the   record    template  and      reserve   a 
result    buffer. 
Step    3:    Extract    a      record    from    the      input   buffer      which   is 

passed   from  the  retrieve-common    subfunction. 
Step    4:    Apply    the    appropriate    hashing    function    to    hash    the 
value      of        the      join      attribute      of        the      record 
according    to   data    type.       (See    Chapter    IV    again.) 
Step    5:    Store    the      record    and    the      hashed    bucket      value   in 

the   result    tuffer. 
Step    6:    If      the   result      buffer      is      full,      then      send      the 
contents    of   the  result      buffer   to   the    bucket-block 
tracking    procedure. 
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Step   7:    Repeat   steps   3,    4,    5   and   6    until   there    are   no   more 

records   in   the   input   buffer. 
Step    8:    Free   the    result    buffer. 

The    SSL    for   the    hashing    procedure    is    provided    in    Appendix    D. 

(2)  The  Eucket-block  Tracking  Procedure.  This 
procedure  stores  the  records  (both  the  source  records  and 
the  target  records)  into  blocks  according  to  their  bucket 
values  and  maintains  one  hashing  table  for  the  currently 
processed  request  and  one  global  table  to  store  the 
logical-hash-table  addresses  for  all  of  the  retrieve-common 
requests   in   system.  The   inputs   to    this      procedure   are    the 

records  and  their  hashed  bucket  values,  which  either  ccme 
from  the  local  hashing  procedure  or  from  the  other  backends. 
A  checklist  is  used  to  ensure  that  the  hashed  results  of  the 
non-local  target  records  are  received  from  all  of  the  other 
backends.  There  is  also  an  additional  disk  I/O  buffer  used 
in  this  procedure  to  move  the  blocks  of  each  bucket  into  and 
out  of  the  primary  memory.  The  outputs  from  this  procedure 
are  the  logical  addresses  of  the  two  hashing  tables  of  the 
source  request  and  the  target  request,  which  are  passed  to 
the  merging  procedure.  The  structures  of  the  global  table, 
hashing  table,  bucket,  and  block  have  been  described  in 
Chapter  IV.  After  processing  all  of  the  local  records,  this 
procedure  will  group  the  local  target  records  together  with 
their  bucket  numbers,  and  then  broadcast  them  to  all  of  the 
ether  backends. 

The      algorithm      for      this      procedure      is      as 

follows. 

Step  1:  Create  the   global  table   and  reserve   a  disk   I/O 

buffer. 
Step  2:  Get  an   input  buffer   of  records.    If  the   input 

buffer  contains  source  records,  then  go  to  step  5. 
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Step    3:    If  the   input   buffer   contains   local   target    records, 

then    go    to    step   6. 
Step    4:    If   the      input   buffer      contains   the      target   records 
received    frcra   the    other   backends,      then    go   tc   step 
8. 
Step   5:    Get    the    hashing    table   for      the    source    reguest.      Go 

to  step   7. 
Step   6:    Get    the    hashing   table   for    the    target  reguest. 
Step   7:    Store   the      record      into  a      bucket  and      perform    the 
bucket-block   tracking      operation    (as      described   in 
chapter   IV).    Go   to   step   9. 
Step    8:    Perform      the   bucket-block      tracking   operations      to 
insert      these      incoming    records      into      the      target 
hashing    table. 
Step   9:    Repeat   steps      2   to    8      until   all    records      have   been 

processed. 
Step  10:  If  the  input  buffer  contains  local  target 
records,  then  retrieve  the  local  target  records 
from  the  target  hashing  table  bucket-by-bucket 
and  broadcast  them  (with  the  bucket  number)  to 
the  other  tackends. 
Step  11:  If  the  input  buffer  contains  non-local  target 
records,  then  get  the  logical  address  of  the 
hashing    table     of   the   source   request.  Pass    the 

logical      address    of      the      hashing      tables   of      the 
source      reguest   and      the      target      reguest   to      the 
merging    procedure   for   the   merging   operation. 
The    SSL   for   this   procedure   is   provided    in    Appendix    E. 

(3)       The    Merging   Procedure.      This  procedure   does 
three   functions: 

(1)  fetching  the  hashing  tables  of  the  source  reguest  and 
the  target  reguest  by  their  logical  addresses  which 
have  been  provided  by  the  bucket-block  tracking 
procedure; 
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(2)  performing      the   merging      operation    on      the    records     of 
both    hashing    tables    (as    described    in    chapter    IV) ;    and 

(3)  sending   the   merged   results   to    the   controller. 

The  merged  results  contains  only  the 
attribute-value  pairs  whose  attribute  names  are  specified  in 
the  target-lists  (either  the  source  request  or  the  target 
request).  The  extra  attribute-value  pairs  (i.e.,  the  join 
attributes  and  their  vales,  which  have  been  added  into  the 
target  lists  by  the  parser)  are  deleted  by  this  procedure. 
The    SSL    for   the    merging    procedure   is    provided    in    Appendix   E. 

C.   TEE  MODIFIED  MESSAGE-PASSING  FACILITIES 

In  Chapter  II  we  have  introduced  the  general  format  and 
the  different  types  cf  MBDS  messages  (see  Figure  2.3  and 
Figure    2.4).  In    order      to    accomplish      the   retrieve-ccmmon 

request    we    have    added  two  new      message    types    which    are    shewn 
in    Figure    5.1. 

D-       EIECUTION      OF      A         RETRIEVE-COMMON       REQUEST — VIEHED      VIA 
MESSAGE-PASSING 

In  this  section  we  describe  the  sequence  of  actions  for 
executing  the  retrieve-common  request  as  it  moves  through 
MBDS.  The  sequence  of  actions  are  described  in  terms  of  the 
types  of  messages  passed  between  the  MBDS  processes:  BEQP, 
PP#  DM,  FECP  and  CC.  The  order  in  which  message  are  passed 
is  denoted  alphabetically  ('a*  is  first).  The  digit 
following  the  ordering  letter  will  be  the  message  type  as 
shown    in   Figures   2.4    and   5.1. 

The  sequence  of  actions  for  a  retrieve-common  request  is 
shown  in  Figure  5.2.  First  the  retrieve-common  request  comes 
to  REC.P  from  the  host  (a1).  REQP  sends  two  messages  to  2P : 
the  number  of  requests  in  the  transaction  (b3)  and  the 
aggregate    operator    cf      the   request    (c4)  .       The      third    message 
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1 

Message  Type 

. 

(32)  Hashed  Target  Records 

Source 

• 

Reccrd  Processing 

Destination 

• 

Reccrd  Processing  (other  backends) 

Explanation 

This  message  contains  the  bucket  numbers 
of  the  target  hashing  table  and  all  of 
the  target  records  associated  with 
their  buckets. 

Message  Type 

l 

(33)  Source  Retrieve  Finished 

Source 

i 

Reccrd  Processing 

Destination 

: 

Directory  Management  (same  backend) 

Explanation 

This  message  is  used  to  notify  Directory 
Management  that  all  of  the   source 
records  have  been  retrieved.  DM  can  then 

begin  processing  the  target  request. 

Figure  5.1    The  New  MBDS  Message-Types. 

sent  by  FEQP  is  the  parsed  traffic  unit  which  goes  to  CM  in 
the  backends  (d6).  Etf  sends  the  type-C  attributes  needed  by 
the  retrieve-common  request  to  CC  (e20) .  Once  an  attribute 
is  locked  and  descriptor  search  can  be  performed,  CC  signals 
DM  (f 26) .  DM  then  process  the  source  request  (target  request 
is  now  held) .  DM  performs  descriptor  search  and  signals  CC 
to  release  the  lock  en  that  attribute  (g23) .  DM  sends  the 
descriptor  ids  for  the  request  to  the  other  backends  (h15). 
The  DM  processes  in  the  other  backends  send  their  descriptor 
ids  to  the  DM  process  residing  in  this  backend  (i15).  DM 
then  uses  its  own  descriptors  and  the  descriptors  received 
from  the  other  backends  to  form  descriptor-id  groups.  DM 
now  sends  the  descriptor-id  groups  for  the  source  request  to 
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The    Sequence    of    Messages    for    Executing    a 
Retrieve-common   Reguest. 
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CC  (j21).  Once  the  descriptor-id  groups  are  locked  and 
cluster  search  can  te  performed,  CC  signals  DM  (k27) .  DM 
then  performs  cluster  search  and  signals  CC  to  release  the 
locks  on  the  descriptcr-id  groups  (m25) .  Next,  DM  sends  the 
cluster  ids  for  the  retrieval  to  CC  (n22) .  Once  the  cluster 
ids  are  locked,  and  the  reguest  can  proceed  with  address 
generation  and  the  rest  of  the  source-request  execution,  CC 
signals  DM  (o28) .  DP  then  performs  address  generation  and 
sends  the  source  reguest  and  the  address  set  to  RECP  (p16) . 
Once  the  retrieval  reguest  has  executed  properly,  RECP  sends 
a  message  to   DM   to  start   processing   the  target   reguest 

(r33)  .  DM  processes  the  target  reguest  in  the  same  way  of 
processing  the  source  reguest  (i.e.,  phases  e20  to  p16) . 
The  retrieved  records  are  processed  by  the  hashing  module  in 
RECP.  Once  the  local  target  records  have  been  processed 
properly,  the  hashing  module  broadcasts  the  hashed  target 
records  (grouped  by  tucket  numbers)  to  the  other  backends 
via  RICF  (s3U) .  The  hashing  modules  in  the  other  backends 
sends  their  hashed  target  records  to  the  hashing  module  of 
this  backend  (t34) .  Once  the  comparing  and  merging 
operations  performed  by  the  hashing  module,  the  results  are 
sent  to  PP  (u2) .    PP  then  forwards  the  results  to  the  host 

(v2)  . 
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71.    CONCLUSION 

A.       REVIEW    AND    SOHHABI 

The      irulti- back  end        database      system       (MBDS)  in      the 

Laboratory  for  Datahase  System  Research  at  the  Naval 
Postgraduate        School        is        designed  to        overcome        the 

performance-gain  and  capacity-growth  problems  of  either  the 
traditional  database  system  or  the 

single-backend-sof tware-database  system.  The  original  MBDS 
supported  four  primary  operations,  namely,  RETRIEVE,  DELETE, 
UPDATE  and  INSERT.  This  thesis  presented  the  design  and 
implementation      of        the      fifth        primary      operation,  the 

RETRIEVE-COMMON  operation.  The  retrieve-common  operation  is 
used    to      merge    two    files      by    common    attributes.  Our    major 

goal  is  to  maximize  the  utilization  and  minimize  the 
affects    to    the   existing   system. 

f?e  have  analyzed  several  possible  design  alternatives 
and  then  selected  the  best  one  for  our  design  and 
implementation  approach.  The  key  issues  for  the  selections 
are  the  cohesion  to  the  design  requirements,  the  design 
issues  of  MBDS  and  the  time  complexities  of  implementation. 
Cur  design  and  implementation  is  based  on  the  bucket-hashing 
approach.  Each  bacKend  performs  partial  merge  with  its 
portion  of  source  records  and  the  entire  set  of  target 
records,  sending  its  results  to  the  controller.  The 
controller  forwards  the  final  results  to  the  user  at  the 
host    computer. 

Based  on  the  selected  design  and  implementation 
approaches,  the  operations  of  the  retrieve-common  reguest 
are  executed  in  four  phases,  the  request-preprocessing 
phase,    the    record- retrieving    phase,       the    hashing-and-storing 
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phase  and  the  merging  phase.  The  retrieve-common  requests 
is  first  parsed  to  be  a  transaction  of  two  retrieval 
requests  {each  of  the  retrieve-common  type  request)  by  the 
parser-  Then,      the     parsed   requests      are   reformated      into 

required  message  fornats  and  broadcasted  to  all  the  tackends 
by  the  composer  of  the  controller.  2ach  backend  receives 
the  formated  messages  of  the  transaction,  separates  the 
source  request  and  the  target  request  and  then  performs  the 
directory  operations  and  retrieves  the  records  according  to 
the      gueries      specified      in   the      requests.  The      retrieved 

records  of  the  source  record  set  and  the  rec  rds  of  the 
target  record  set  are  separately  hashed  on  t..eir  common 
attribute  values  and  then  stored  into  buckets  of  the  source 
hashing  table  and  the  target  hashing  table,  respectively. 
The  hashed  records  of  the  source  buckets  and  the  records  of 
the  target  buckets  are  compared  and  merged  bucket-by-bucket. 
The  merged  results  are  sent  to  the  controller  from  all  of 
the  backends.  The  controller  then  forwards  the  results  to 
the  host  computer.  In  order  to  accomplish  the  operations  of 
the  retrieve-common  request,  we  have  designed  a  hashing 
module   into   the    record-processing   process    of   each   backend. 

For      integrating   cur      design   into      MBDS,         we   have      made 
several    irodifications.    These   are: 

(1)  the    message-passing  facilities, 

(2)  the   parser      of    the   request-preparation   process      of   the 
controller,    and 

(3)  the  directory-management  process  and  the 
record-processing   process   of    each   backend. 

The  algorithms  for  the  modifications  and  the  program 
specifications  (SSL)  are  also  provided  in  Character  IV,  V 
and    Appendices. 
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B-       FUTURE    WORK 

The  next  step  in  the  design  and  implementation  cf  the 
retrieve-common  operation  is  the  modification  of  the  :iEDS 
software  according  to  the  SSL  given  in  the  appendices.  There 
are  two  classes  cf  modifications.  First,  existing  software 
is  updated  to  reflect  the  changes  necessary  for  the 
retrieve-common  operation.  In  the  system,  r.ew  message  types 
must  re  defined,  the  request-prepatation  and  post-processing 
processes  of  the  controller  are  changed,  and  the 
directory-management  process  is  changed  to  correctly 
sequence  and  execute  the  retrieve-common  request.  Second, 
new  software  is  written  to  handle  the  processing  of  the 
retrieve-common  request,  i.e.,  the  hashing  module.  In  the 
system,  the  software  for  the  hashing  module  is  coded  tested, 
and  integrated  into  the  record-processing  process  of  each 
lackerd. 
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APPENDIX  A 
THE  MODIFIED  REQUEST  PREPARATION  PROGRAM  SPECIFICATIONS 

In  this  appendix,  we  present  only  the  modified  portions 
of  the  Request  Preparation  process.  The  original  SSL  is  in 
[Ref-  11  :  p. 87]. 

A.   TEE  IEX  MODIFICATIONS 

*  * 

*  We  have  added  the  regular  expression  for  the  token   * 

*  * 

*  COMMON  into  LEX.   The  rest  of  LEX  remains  unchanged.* 

*  * 

*  The  original  specification  is  in  the  lsrc  file.      * 

*  * 

(The  original  lscr  specifications.) 

EY        £ 

return  (TOKBY)  ; 

} 
COMMON    { 

return  (TOKCOM)  ; 

} 
"<  =  "      { 

return  (LS)  ; 
} 

(The  original  lscr  specifications.) 
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B.       TEE    IACC   HODIFICATTONS 

In      this      section,      we      present      only      the   SSL      for      the 

modified   portion   of    the   parser.        The   original  program    is    in 

the    ysource   file. 

procedure   yyparse  ()  ; 

/************* ****************************************** 

*  This  procedure  is  used  to  parse  the  output  of  LEX.  * 

*  The  modification  of  the  yyparse  procedure  converts  * 

*  the  retrieve-ccmmon  reguest  from  a  single  reguest  * 

*  into  a  transaction  of  two  reguests.  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure:  * 

*  1.  No  new  data  structures  are  introduced  by  this   * 

*  modification.  * 

*  2.    com_f lag_  1  r   com_flag_2,    com_flag_3,    com_flag:        * 

*  Boolean    variables    which   are   used    indicate    the        * 

*  different    conditions   of    the    retrieve_common  * 

*  reguest.  * 

*  3.    new_tbl_ptr:  * 

*  A  pointer    to   a   reguest   table.  * 

*  The   reguest   table   is    defined    in    the   commdata .def* 

*  file    as   a   EEQtbl_definition  structure.  * 

*  4.    com_a trb_  1,   com_atrfc_2:  * 

*  Character  strings  to  hold  the  common  attribute.  * 
*******************************************************/ 

/*  The  following  is  the  modified  portion  of  yysource. */ 

/*  Add  a  new  token  in  the  specification.  */ 
Stcken  [str]  TOKCCM   /*  common   */ 

/*  Add  new  derivations  and  program  specifications.   */ 

transaction  :  beg_tran  lines 

/*  No  changes  in  this  part   */ 
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/*    cf   the   transaction   rule.    */ 
|    beg_single_reg    line 
if   com_flag 
then 

/*    This   is   a    retrieve-common 
request.    */ 

Perform    the   operations    which    are 
specified   under    the  beg_tran 
lines; 
else 

/*    Perform   original   operations.    */ 
end  if; 

end_reg  :    EOR 

/*  Clear  the  com_flags.  */ 
com_flag  =  false; 
com_flag_3  =  false; 

reg_forms  :  delete  guery 

I 

1      .../*  These  are  the 

original  derivations.  */ 

I 

I   reg_forms  ccmmcn  target_list  reg_forms; 

ccmmcn    :  TOKCCM 

perform  CHECK_REQUEST_TYPE  (reg_tbl, OK) ; 
/*  Check  if  the  first  reguest  is 
a   retrieve.  */ 
if  CK 
then 

com_flag  =  com_flag_1  =  true; 
else 

perform  ERROR_PROCEDURE; 
end  if; 
attribute  :  LETTERFIRST 
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if   com_flag_1 
then 

/*   This    attribute    is    the    common 
attribute    of    the    source 
request.    Copy    the   attribute 
into   com_atrb_1.    */ 
perform   strcpy  (com_atrb_1 , 
attribute)  ; 
/*    Put    the   common    attribute   of 
the    source    request   into 
the    target    list    and 
convert    the   request   table    from 
the    form   of   single    request    tc 
the    form   of    a    transaction.    */ 
perform    CONVERT (tbl_ptr->req_tbl , 

com_atrb_1/ 
traf_id,    req_cnt, 
new_tbl_ptr->req_tbl)  ; 
com_flag_2    =    true; 
com_flag_1    =    false; 
/*    com_flag   =    true    */ 
else 

if   com_flag_2 
then 

/*    This   attribute   is    the 
common   attribute    of    the 
target   request.    */ 
com_atrb_2  =   strcpy (attribute)  ; 
com_flag_3  =    true; 
com_flag_2   =    false; 
else 

if    com_flag_3    =    true; 
then 

/*    This    is    the    first 
attribute    of    the    target 
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list   of    the  target 
reguest.    */ 

insert   com_atrb_2   intc    the 
target    reguest    table; 
insert    the   attribute    into 
the    target    reguest    table; 
end   if; 

/*   Perform    the   original 
operations.    */ 
end  if; 
end ; 

retrieve      :    T0KRE1RIEVE 

if  ccm_flag_3 
tien 

perforin    EBEOE_PBOCEDUEE; 
else 

if  com_flag 
then 

/*  Change  the  type  to  be 
RETRIEVE_COHMON.   */ 
end  if; 
end  if; 
/*  Perform  the  original  operations.  */ 

delete     :  TOKDEIETE 

if  com_flag 
then 

perform  ERROE_PROCEDORE  ()  ; 
else 

/*  Perform  the  original  operations.  */ 
end  if; 

insert     :  TOKINSEET 

if  ccn_flag 
then 
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perform    EEEOE_PEOCEDURE  ()  ; 
else 

/*    Perforin    the    original   operations,    */ 
end    if; 


update  :    TOKUEIATE 

if   ccn_flag 
then 

perform    ERROR_PROCEDURE  ()  : 
else 

/*    Perform    the    original   operations.    */ 
end    if; 

/*   Perform    the    original   operations.    */ 

end    procedure   yyparse; 

procedure   CONVERT  (input :    source_reg_table/    source_com_atr, 

traf_id,    request_number, 
index_reg_ptr; 
output:    target_req_table,    request_numher, 
index_reg_ptr) ; 

*  This   procedure  is    used    to    rearrange    the  contents  * 

*  of    the   request   table    of    a    request    which   is    the  * 

*  source   retrieve   of   a    RETRIEVE_C0fii10N    request.  * 

*  This   procedure  performs    the   following    tasks:  * 

*  1.    Rearrange   the   source   request    table.  * 

*  2.    Make  the   common    attribute   of    the   source    request* 

*  the   first  attribute   of   the   target   list.  * 

*  3.    Create    a    request    table   for   the    target    reguest      * 

*  and    returr.   it   to   the   calling   procedure.  * 

*  * 

*  Data   structures  and  variables   used    in   this  * 

*  procedure  are:  * 

*  1.  source_req_table,  target_req_table:  * 

*  The  request  tables  of  the  source  request  and    * 
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*  the  target  request.  * 

*  2.  new_table:  * 

*  An    array   of   Reqt bl_def inition   structures.  * 

*  3.    traf_id;  * 

*  A   character  string  which  is  the  traffic  id  of  * 

*  a  transaction.  * 

*  4.  reguest_number:  * 

*  An  integer  which  is  used  to  indicate  the  * 

*  number  of  requests  in  a  traffic  unit.  * 


*  5.  index_reg_ptr: 

*  A   pointer  to   a   parsed   traffic   unit,    which      is 

*  an      array  of   Reqtbl_def inition    structures. 

*  6.    source_ccm_atr:  * 

*  A   character   string    which    is    the   common  * 

*  attribute   of    the    source    reguest.  * 

/*    Use    a   new    reguest    table,    new_table    to   hold    the 

contents   of    the  source_reg_table,    */ 
new_table^O ]   =    ECR; 

new_table[ 1  ]  =    str_to_num (traf _id)  ; 
cew_table[2]   =    reguest_number ; 

new_table[3]  =    rcuttype;    /*   Defined   in    yyparse  ()  .  */ 
new_table[4 ]    =    RETRIEVE_COMMON ; 
/*   Copy   the  contents  of   the  source  request    table    into 

the    new_table.    */ 
i    =    5; 
repeat 

new_table[i]    =  source_req_table[ i  ]  ; 

i   =    i+1  ; 
until   source_req_table[  i]    =   EOQ; 

/*   Insert    the   common   attribute   into    the   new_table.*/ 
new_table^i]  =    scurce_com_a tr ; 
i    =   i+1 ; 
/*   Ccpy   the    rest  of    the    source_reg_table   into 
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the   nev_table.    */ 
repeat 

new_table£i ]    =  source_req_table[  i- 1 ]; 

i  =    i+1 ; 
until    source_req_table[ i- 1  ]    =   null; 
/*    Put   an    end-of-request    marker,    EOR, 

into   the   new_table.    */ 
new_table[i]    =    ECR ; 

/*    Copy   the   new_tabie   into   the   source_req_table.    */ 
i   =    0; 
repeat 

source_reg_table[ i  ]   =    new_table[  i  ]; 

i   =   i+1 ; 
until   source_reg_table[ i ]   =    EOR; 

/*    Increase    the    request    number,    and  create   a    request 

table   for    the    target    reguest.    */ 
reguest_number  =    request_number+  1 ; 
perform    ALLOCATE_EEQ_TABLE  (target_req_table)  ; 
/*    Put    the   target_req_table    into    the 

parsed   traffic  unit.    */ 
index_req_ptr->req_tbl[  request_number-  1  ] 

=    target_req_table; 
/*    Return   the    request   number,      target_reg_table    and 

index_req_ptr    to   the    calling    procedure.    */ 
end    procedure  CONVERT; 
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procedure   CHECK_REQUEST_TYPE (input:    reg_tbl;    output:    ok); 
/***♦************************************************♦ 

*  This   procedure   is   used   to   check  the   syntax    of   a  * 

*  retrieve_ccmmon   request.      If    the    request   type    is  * 

*  not   retrieve,    set   OK   to    false.    Otherwise,    set    OK  * 

*  to   true.  Return   OK   to   the    calling   procedure.  * 

end    procedure   CHECK_BIQOEST_TYPE; 


procedure    ERROR_PP.OCI£URE  ()  ; 

*  This   procedure  is   used   whenever   there    is   a  syntax      * 

*  error   in    the    request.  * 

*  This   procedure   will   print   an    error    message  and             * 

*  terminate   the    parser    operations.  * 

end    procedure    ERROR_EROCED0"RE; 
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APPEND*!  1 
THE  MODIFIED  DIRECTORY  MANAGEMENT  PROGRAM  SPECIFICATIONS 

The  original  SSL  for  the  Directory  Management  process  is 
in  [ Ref -  13  :  p.  82-102]-  In  this  appendix,  we  present  only 
those  procedures  which  are  affected  by  the  retrievG-ccmmon 
request. 


procedure  DM_ParesedTrafOnit () ; 

*  This  procedure  is  used  when  Request  Preparation  * 

*  (REQP)   sends   a   traffic   unit   to    Directory  * 

*  Management  (DP).   The  original  procedure  is  in  * 

*  the  tu.  c  file-  * 

*  We   add   an    if    statement    to    differentiate   between  * 

*  the   retrieve- common      request    type    and   the    other  * 

*  request    types.  * 

*  No   new   variables   are    introduced    in    this   procedure.  * 

/*   Get   a  pointer   to  the    parsed   traffic   unit.    */ 

ti_ptr   =  DM_R$ParsedTrafUnit  ()  ; 

/*   Get    a  pointer   to   the    record    template 

of   this   traffic   unit.    */ 
tirpl_ptr  =    get_tmpl_ptr  (ti_ptr->ti_dbid)  ; 
/*  Get   a  pointer    to  the   attribute  table.    */ 
Al   =   AT_lookuptbl  (ti_ptr->ti_dbid)  ; 
/*   Get    the    type-c  attributes   for   the    traffic    unit 

and   send    them    to    DS_CC.    */ 
perform  DM_TypeC_Attrs_Traf Unit  ()  ; 
/*   Process    the   requests    of    this    traffic    unit.    */ 
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ri_ptr  =  ti_ptr  ->  ti_f irst_req_pointer ; 
/*  Get  the  type  cf  the  first  request  of 

this  traffic  unit.  */ 
if  reg_type  =  RETRIEVE_COMMON 
then 

/*  Tie  will  crly  process  the  source  request.  */ 
/*  The  target  request  will  cot  be  processed  */ 
/*  until  the  record-processing  process  has   */ 
/*  retrieved  all  of  the  source  records.     */ 
/*  Perform  the  descriptor  search  processing.  */ 
done  =  NINS_SR_DESC (Srie,  ri_ptr,  tmpl_ptr,  AT) ; 
if  done 
then 

/*  Broadcast  the  descriptor  ids  to  the 

other  backends.  */ 
DM_Broadcast_DIDs (Srid)  ; 
end  if; 
else 

/*   This   is   net   a  retrieve-common    transaction,    so 
process    the   requests   of    the   traffic    unit 
one-by-one.    */ 
end    if; 
end    procedure   DM_Par esedTraf Unit; 

procedure    DM_RecP_Msg  () 

*  This  procedure    is   used    when    there    is  a    message  * 

*  for   DM    from    RICP    (in    the   same   backend) .  * 

*  * 

*  We  add  a  new  message  type  to  indicate   that  all    * 

*  cf  the  source  records  have  been  retrieved.         * 

*  * 

*  No  new  data  structures  or  variables  are  used.      * 

*  The  original  procedure  is  called  by  * 
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*   DM_THIS_BE_MSG()  and  is  in  the  airman. c  file.      * 

/*  Get  the  message  type.  */ 

MsgType  =  DM_R$Type; 
switch  (MsgType) 
case  OldNewValue: 

perform  DM_01dNewValues () ; 
case  UpdFinished: 

perform  Dl»_UpdFinished  ()  ; 
case  Source_f irished: 

/*  This  is  the  message  which  indicates  the 
completion  of  the  retrieval  of  all  the 
source  records.  */ 
perform  DM_Source_f inished (msg) ; 
end  switch; 
end  procedure  DM_RecE_Msg; 

procedure  DM_Source_finished (input:  message); 

/******************************************************* 

*  This  procedure  is  used  when  DM  receives  a  messages,  * 

*  from  RECP,  which  indicates  the  completion  of  the  * 

*  retrieval  of  all  of  the  source  records.  DM  is  now  * 

*  ready  to  process  the  target  request.  * 

*  * 

*  This   procedure  is    called    by    DM_Recp_msg  ()  .  * 
*******************************************************/ 

/*   Receive    the   request   id    from   the   message.    */ 

perform   DM_R$Rid  (source_reg__id)  ; 

/*   Get    a  pointer   to   the    traf_info  entry   by  the 

source_reg_id.  */ 
ti_ptr    =   DM_TiFind (source_reg_id)  ; 

/*   Get    a   pointer    to    the    reg_info   entry   for    the   source 
reguest.     */ 
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source_reg_info_ptr  =    DM_KiFind  (reg_id,    ti_ptr) ; 

/*  Get    a  pointer    to   the    reg_info   entry   for   the    target 

reguest    by   the  source_reg_inf o_ptr.    */ 
target_ri_ptr  =    scurce_reg_inf o_ptr->next_reg_inf o; 

/*   Get    the   reguest  id   of    the   target   reguest.    */ 
target_reg_id   =    Pind_reguest_id (target_ri_ptr) ; 

/*   Perform    the   directory    operations  on   the 

target   reguest.*/ 
/*   Get    the    record  template    for   the  target   reguest.*/ 
tipl_ptr  =    get_tn;pl_ptr  ( ti_ptr->ti_tbid)  ; 
/*   Get   a   pointer    to   the    attribute    table.    */ 
AT   =    AT_lookuptbl  (ti_ptr->ti_dbid)  ; 
/*   Perform    the   descriptor   search   processing.    */ 
dene   =    NINS_SR_DESC (&rid#    ri_ptr,    tmpt_ptr,    AT); 
if    dene 
then 

/*   Broadcast   the   descriptor   ids    to   the    other 

backends.    */ 
perform   DM_Eroadcast_DIDs (Srid) ; 
end ; 
end    procedure    DM_Source_f inished; 
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APPENDIX  C 
THE  MODIFIED  RECORD  PROCESSING  PROGRAM  SPECIFICATIONS 

In  this  part  of  the  appendix,  we  have  added  the 
retrieve-common  subfunction  into  the  control  function  cf  the 
physical-data-operaticn  subprocess  of  the  record-processing 
process  (RECP)  .  We  have  presented  only  the  modified  portion 
of  the  original  RECP  in  this  appendix. 


procedure  ReqProcessing (input:  MsgType)  ; 

*  * 

*  Ihis  procedure  is  used  to  process  requests  according  * 

*  * 

*  tc  the  request  type.  * 

*  * 

*  We  add  the  retrieve-common  request  type  into  the     * 

*  * 

*  switch  statements  as  one  of  the  optional  cases.       * 

*  * 

*  This  procedure  is  called  by  the  procedure  RP  DM.  The  * 

*  ~        * 

*  original   procedure   is  in   the   reproc.c   file.  * 

*  3  r  c  * 

/*   Get    the   request   type.    */ 
switch    (request_type) 
RETRIEVE_COMMON: 

perform  ST_RetDel  ()  ; 
/*    From   this    point,    we    ues    the    same 
procedures    as   used    for   the 
RETRIEVE    request   processing.    */ 
/*   Now,    back   to    the   original    ReqProcessing ()  .    */ 
end   procedure  ReqProcessing; 
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procedure  EP_ReadCompleted  ()  ; 

*  * 

*  This  procedure  is  used  when  a  physical  read  is    * 

*  * 

*  completed.       We  add   the    retrieve-common   request  * 

*  * 

*  type  into  its  switch  statements  as  one  of   the     * 

*  * 

*  the    request    types   cases.  * 

*  * 

*  This  procedure  is  called  by  the  procedure  RP  FP.   * 

*  * 

*  The   original   procedure    is   in    the    recproc.c   file.      * 

*  * 

/*  Get  the  request  type  of  this  request.  */ 
switch  (request_type) 
RETRIEVE_COMMON  : 

perform  EC_Ret  ()  ; 
RETRIEVE: 

perform  EC_Eet(); 

/*    Now,    hack   to    the   original    processing.    */ 
end    switch ; 
end    procedure  RP_Rea dCompleted ; 


procedure   RB$SEND_COMPLETION  (input:    RB_ptr,    reqtype)  ; 

/***$************************************************** 

*  This  procedure  does  the  following  tasks:  * 

*  1.  Send  the  contents  of  the  result  buffer  to  * 

*  either  the  hashing  module  or  the  controller,  * 

*  depending  on  the  request  type.  * 

*  2.  If  this  is  a  source  request  of  a  retrieve-  * 

*  common  request,  then  send  a  message  to  DM  * 

*  indicating  that  all  of  the  source  records  * 

*  have  been  retrieved.  * 

*  3.  Send  a  message  to  CC  to  release  the  locks  on  * 

*  the  database  for  this  request.  * 

*  4.  Free  the  result  buffer  space  after  the  * 

*  contents  of  the  result  buffer  have  beer,  sent.* 

*  * 
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*  All   of    the    data   structures   ans    variables    are  the      * 

*  same  as    the   original   procedure.  * 

*  This   procedure  is  called  by    the    procedure  * 

*  EC_Ret  ()  .  * 

*  The   original   procedure    is    in   the   recproc.c    file.      * 
***************************************** *#**$*******/ 

/*   Get    the   request  id    by    the   result    buffer   pointer 

RB_ptr. */ 
request_id    =   RB_ptr->RB_rid ; 
if    regtype   =    RETEIEVE_COMMCN 
then 

if   the   result_buff er    is    full 
then 

/*  Send  the  contents  of  the  result  buffer  */ 
/*  to  the  hashing  module  and  reinitialize  */ 
/*   the    buffer    size   to    0.  */ 

EASH_FUKC (reguest_id,    result,    result_length) ; 
result_length    =    0; 
end    if  ; 

if  this   is    the    last    result    buffer 
for   this  request 
then 

/*   Send   the   result    buffer    to   the 

hashing   module.    */ 
perform   HASH_F0NC  (request_id,    result, 

result_length) ; 
if   this  is   a   source  request 
then 

/*  Send  a  message  to  DM  indicating  */ 
/*  that  all  of  the  source  records  */ 
/*   have   been    retrieved.  */ 

perform    DM_FinReq$RP_S (request_id)  ; 
end   if; 
/*   Free   the   result    buffer    space.    */ 
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perform   Recp_f ree  (reguest_id)  ; 
/*  Send  a   message   to  CC   to        */ 
/*   release  the   locks    for   this  */ 

/*    reguest.    */ 

perform  CC_FinReg$RP_S (request_id) ; 
end   if; 
else 

/*  This   reguest   is   not   a    retrieve-common 
reguest. 

Now,    back    to   the   original   processing.    */ 
end    if; 
end    procedure    RB$SENE_COMPLETICN; 

procedure   XTRACT (input:    TRACK_EUFFER,    indexB,    result2, 

reguest,    tmpl_ptr,    target_ptr; 
output:    result2) ; 

*  This   procedure   extracts   the    attribute  names   and  * 

*  values      which     correspondend      tc   the   target    list  * 

*  cf   a  record.  * 

*  This  procedure  is  called  by  the  procedure  * 

*  $RETR_PEOCESSING()  .  * 

*  The  original  procedure  is  in  the  rbabs.c  file.  * 

*  We  add  an  end-of-recor d  marker,  EOR,  at  the  end  * 

*  of  every  record.  * 

/*  Process  all    statements  of  the  original  procedure 
until  the  end  of  the  outermost  while  loop.  */ 

/*  Add  the  following  processing.  */ 

if  the  regtype  =  RETRIEVE_COMMON 
then 

put  the  EORecord  marker  into  the  result  buffer; 

end  if; 

/*  Now,  back  to  the  original  processing.  */ 
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end  procedure  XTRACT; 

procedure  RB$PUT_SEND  (input :  RESULT_BO"FFER,  result, 

length_of_result) ; 
/******  *************************$$**£*****$#**$*$*$**$* 

*  This  procedure  puts  the  results  for  a  request  * 

*  into   the  result  buffer-  If  the  result  buffer  is   * 

*  full,  then  the  contents  of  the  buffer  are  sent  to  * 

*  the  controller  or  the  hashing  module  and  the  * 

*  length  of  the  buffer  is  set  to  0.  * 

*  This  procedure  is  called  by  the  procedure  * 

*  RETR_PROCESSING () .  * 

*  Ihe   original    procedure   is    in   the    rbabs.c    file.  * 
*****************************************************/ 

if   the    result    buffer   is    full 
then 

/*   Find   the    request    type   in    the    result    buffer.*/ 
regtype   =    FIND_reg_type (result_buf f er) ; 
if    regtype    =  RETRIEVE_COMM0N 
then 

/*    Send   the   results    to   hashing    module.    */ 
perform   HASH_FUNC (result_buf fer) ; 
else 

/*   Send  the  results   to   the    controller.    */ 
perform  RES$CNTL$RP_S (reguest_id, results, 

length_of_result) ; 
end  if; 

length_of_result   =    0; 
else 

/*   Store    the   results   into    the    result    buffer.    */ 
/*   Now,    back  to    the    original   processing.  */ 

end   if; 
end    procedure  R3$P0"T_SEND ; 
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procedure    RP_CNL_ANOTEER_BE_MSG  ()  ; 

y******************************** ********************* 

*  The  purpose  of  this  procedure  is  to  process  * 

*  the  messages  received  from  the  controller  or  * 

*  the  other  backends.  * 

*  This  procedure  is  modified  for  processing  the     * 

*  the  hashed  information  of  the  non-local  target  * 

*  records-  * 

*  The  original  procedure  is  in  the  reproc- c  file,   * 
****************************************************/ 

/*  Get  the   message  type.  */ 

perform  MsgType  =  Type$RP_R; 

case  MsgType  of 

Bucket_info ; 

/*  This  message  is  the  hashed  information  */ 
/*  for  the  non-local  target  records.  */ 

perforu  PROCESS_BE_TARGET () ; 

/*  This  procedure  should  return  the  sender,*/ 
/*  the  reguest_id  of  the  target  request  */ 
/*  and  whether  or  not  this  is  the  last  */ 
/*  message  from  this  backend.  */ 

/*  Check  to  see  if  all  the  target  records  */ 
/*  of  all  the  other  backends  have  been  */ 
/*  received.  */ 

if  LAST_MSG 

then 

perform   CHECK_RECEIVE_MSG  (sender, 

reguest_id,    ALL_RECEIVED) ; 
end    if; 
if    ALL_EECEIVED 

then 
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perform   STAET_TO_MERGE (reguest_id) ; 
/*  The  call€d   routine    will   perform        */ 
/*  the   merging   operation    and    send   the    */ 
/*   results   to   the    controller.  */ 

end   if; 

/*  Now,    back  to   the   original   processing.  */ 

end   case; 

end    procedure    RP_CNL_ANOTHER_BE_MSG; 

procedure    PROCESS_BE_TARGET (input;    message; 

output:    sender,    reguest_id 
LAST_RECORD) ; 

*  This  procedure  is  called   to   process   the    message      * 

*  which   contains   the    hashed    bucket    information   of      * 


*  the  non-local  target    records.  * 

*  This  procedure   will    return   the   sender  of    the  * 

*  message,    the    request    id    of    those    non-local  * 

*  records    and    a  boolean    variable,    LAST_RECORD,    to  * 

*  indicate   that   all   of   the    target    records    from    the  * 

*  sending   backend    have   been  received.  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  LAST_RECCRD:  A  boolean  variable  which  is  * 

*  used  to  indicate  the  end  of  * 

*  this  request.  * 

*  2.  message:  A  character  string  which  is  used  * 

*  to  store  the  hashed  results  of  * 

*  target  records  and  is  sent  from  * 

*  the  other  backends.  * 
***♦************************************************/ 

/*  Get  the  sender  of  the  message.  */ 

perform  GET_!VISG_SENDER  (sender)  ; 

/*  Get  the  request  id  of  the  request.  */ 
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perform    GET_REQUEST_ID(request_id) ; 

/*   Now,    check   the  global   table    to  find    the   address    */ 
/*   of    the    hashing  table    for    this    request.    */ 
perform  CHECK_G1CEAL_TABLE  (request_id,    hash_table, 

NEW_REQUEST) ; 
NEW_EECORD    =    true; 

/*   Since  the    message   is    an   array   of   characters,  */ 

/*  we  have  to  bypass  the  header  to  get  the  record  */ 
/*  information-  If  this  message  is  the  last  message  */ 
/*   of    the   sending   backend,    then    there    will   be   an  */ 

/*  end-of- request  marker,  EORequest,  in  the  front  */ 
/*   of   the   end-of-iiessage   marker.  */ 

I    =    the_integer_which_stands_f or 

_the_index_where_recor d_start ; 
/*   Gets    the   bucket_numbers   and    their   associated         */ 
/*    records    from    the   message,    then   insert   them   into      */ 
/*   correct    buckets   of    the    hashing    table.  */ 

while    ((not    end    cf    message)     or     (not    end    of    request))    do 
perform   GET_BUCKET_NOMBEE (message,    I,    bucket_value) ; 
/*  Get   the   bucket   number   of   the   record   and    the        */ 
/*   record    itself   from    the   message,    and  then  */ 

/*   store   the   record   into   the   appropriate   bucket      */ 
/*   of    the    hashing    table    ty    using    the    */ 
/*  bucket    number.    */ 

perform   GET_A_RECORD_SET (message, I, set) ; 
jerform    STORE_RECORD_IN_EASH_TABLE  (hash_table, 

bucket_number,    set,    NEW_RECORD) ; 
NEW_RECORD    =    false; 
end    while; 
if    EOReguest 

then    LAST_RECOED   =    true; 
else   LAST_RECOED    =   false; 
end    if; 
end    procedure  PROCESS_BE_TARGET; 
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procedure  START_TO_MEEGE (input :  reguest_id) ; 

*  This  procedure  is  called  when  the  target  record  * 

*  set  has  been  received  from  all   of  the  other  * 

*  backends.  * 

*  The    input    reguest_id      is   the    request   id    of    the  * 

*  target   reguest.  * 

*  The   data    structures   and   the   variables   used    in  * 

*  this   procedure  are:  * 

*  1.    TAEGETJTAELE    :    The    hashing   table  for    the  * 

*  target    reguest.  * 

*  2.    SOURCE_TJELE    :    The    hashing   table   for    the  * 

*  source  reguest.  * 

*  3.    target_id:    The    reguest   id   of    the    target  * 

*  reguest.  * 

*  4.  source_id:  The  reguest  id  of  the  source      * 

*  reguest.  * 

targ€t_id  =  reguest_id; 

/*  Get  the  source  reguest  id.  */ 

perform  GET_SOURCE_ID (target_id,  source_id) ; 

/*  Get  the  hashing  table  of  the  source  request.        */ 

perform  CHECK_GLOEAL_TABLE  (source_idr  global_table 

source_hash_table , 

NEW_REQ0E5T) ; 
/*  Get  the  hashing  table  of  the  target  reguest.      */ 
perform  CHECK_GLOEAL_TABLS  (target_id,  global_table 

target_hash_table, 

NEW_REQJEST) ; 
/*  Merge  the  records  of  these  two  reguests  and  send  */ 
/*  the  results  to  the  controller.  */ 

perform  MERGE  (sou rce_id,  source_hash_table. address 
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target_hash_tatle.  address)  ; 
end    procedure    START_TC_MERGE ; 


procedure   GET_SOURCE_ID (input :    request_id; 

output :request_id)  ; 

/*^**4****************************************** ****** 

*  This  procedure  is  used  to  find  the  request  id  for  * 

*  the  source  request  by  using  the  request  id  of  th€  * 

*  target  reguest.  * 

*  Recall  that  the  source  request  and  the  target  * 

*  request  has  the  same  traffic  id,  the  difference  * 

*  between  them  is  that  the  request  number  of  the  * 

*  source  request  is  less  than  that  of  target  * 

*  request  by  1.  * 

end  procedure  GET_SOUBCE_ID; 
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procedure  CHECK_RECEIVE_MSG  (input:  sender,  requested; 

output:  ALL_RECEIVED) ; 
/****** ****************  *********  ********  ******  *  +  #**** 

*  This  procedure  is  used  to  check  whether  all     * 

*  of   the  non-local  target  records  have  been      * 

*  retrieved  from  all  of  the  other  backends  for     * 

*  a  particular  request.   If  all  of  the  non-local   * 

*  target  records  have  been  received,  then         * 

*  AIL_EECEIVED  is  set  to  true.  Otherwise,  * 

*  AIL_RECEIVED  is  set  to  false.  * 

************* *************************************/ 

end    procedure   CHECK_RICEIVE_MSG; 


procedure   CHECK_GLOB AI_TABLE  (input:request_id; 

output:    hash_table, 

NEW_REQOEST) ; 

^  *****************************************  ************ 

*  This  procedure  is  used  to  check  whether  a  request  * 

*  is  a  new  request  by  checking  if  the  request  id  is  * 

*  in  the  global  table.  If  the  id  is  found,  then  set  * 

*  the  value  of  NEW_REQUEST  to  false  and  return  the  * 

*  NEW_VALUE  and  the  hash_table  of  of  the  request.  * 

*  This  procedure  has  been  defined  in  HASH_F0NC().  * 

end  procedure  CHECK_GIOBAL_TABLE; 
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procedure   GET_BUCKET_NOMBER (input :    message,    index; 

output:    index,    bucket_number) ; 
/************#************************************** 

*  This  procedure   is   used   to   extract    the   bucket  * 

*  numbers    from    the    message,    then    return   the  * 

*  lucket_number  and   the    incremented    index    to    its  * 

*  caller.  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure:  * 

*  1.  bucket:  A  character  string  representation   * 

*  of  the  bucket  number.  * 

*  2.  j:  A  general  purpose  index.  * 
*****♦*********#**#********************************/ 

j  =  0; 
repeat 

bucket[j]  =  message[  index  ]; 

index  =  index*  1; 

i  =   j+1; 

until    message£i]    =   EOV; 

perform    STRING_TC_INTEGER (tucket ,    bucket_number) ; 
end    procedure    GET_BOCKET_NUMBEE; 
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procedure   GET_A_RECORD_SET(input:    message,    I; 

output:    set) ; 

/*****  ***************  ********************  ************ 

*  This   procedure  is   used   to   extract    the  common  * 

*  attribute   value   of   a    record  and   the    record    itself* 

*  from    the    message    which    contains   the    hashed    bucket* 

*  information   of   the   non-local    target    records.  * 

*  * 

*  The  data  structures  and  the  variables  used  in    * 

*  this  procedure  are:  * 

*  1.  set:  A  array  which  contains  the  common      * 

*  attribute  value  of  a  record  and  the    * 

*  record  itself.  * 

*  2.  j:  A  general  purpose  index.  * 
***************************************************/ 

J  =  0; 
repeat 

set[J]   =    message[I]; 

I   =   1+1; 

J    =    J+1; 
until   message[I-1]  =   EORecord; 
end    procedure    GET_A_RECORD_SET ; 
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APPENDIX  D 
THE  HASHIHG  PBOCEDOEE  PROGRAM  SPECIFICATIONS 

Procedure  HASH_FUNCTICN (input;  request_id,  result,  length; 

output:  request_id,  hashed_result, 

length_hashed_result) ; 
/*#****************#**************************♦******#* 

*  The  purpose  of  this  procedure  is  to  hash  the  value  * 

*  of  the  join    attribute  into  a  bucket  of  the  hash  * 

*  table.  * 

*  A  hash  buffer  is  reserved  to  store  the  hashed  * 

*  results.  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.    hash_buffer:    A    variable    of    the   data    type  * 

*  hashing_buf f er      which  is   used  * 

*  to   stored   the   records  and   their  * 

*  hashed   bucket    values,    and    is  * 

*  defined    in    hashing_module.def .  * 

*  2.    RP_rid_irfo:    The   information    for  a   request.  * 

*  This    structure    is    defined    in  * 

*  the   commdata. def    file.  * 

*  3.    RP_rid_ptr:        A    pointer    to   the    data    structure  * 

*  of   type  RP_rid_info.  * 

*  4.  req_tbl_ptr:  A  pointer  to  a  request  table.  * 

*  The  request  table  is  defined  in  * 

*  the  commdata. def  file  as  a  * 

*  REQtbl_def inition  structure.  * 

*  5.  temp_entry:  A  variable  of  data  type  rt_ntry  * 

*  which  is  defined  in  commdata. def .  * 

*  6.  tem_ptr:  A  pointer  to  temp_entry.  * 

*  7.  rt_enrty:  A  pointer  to  a  field  of  RP_rid_info.* 

*  The  type  of  this  field  is  rt_ntry.  * 
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/*   Check  if    the    request    id      is    a   new    request.  */ 

if   new    request 
then 

/*   Get    the   record    template    to   find   the    value  */ 

/*   type    (i.e.,    integer,    string    or   float)    of    the   */ 
/*   common    attribute   value.  */ 

perform    FIND_RP_rid_info (reguest_id,RP_rid_ptr) ; 
/*   Get    a   pointer    to   the    request    table   from    the      */ 
/*   EP_rid_info.      */ 

req_tbl_ptr   =    RP_rid_ptr   ->    RP_ri_req; 
/*   Find    the   attribute    name    from 

the    request   table.    */ 
perform   FINE_COMMON_ATTRIBUTE (req_tbl_ptr , 

attribute_narae) ; 
/*   Get    a    pointer    to      the   entry         */ 
/*   of    the   template    for   the    common    attribute.  */ 

tem_ptr   =    RP_rid_ptr    ->    RP_ri_trapl_ptr    ->    rt_entry; 
/*   Get    the    value   type    of    the  common  attribute  */ 

/*   from    the    record    template.  */ 

if   tem_ptr->temp_entry. value_data_type    =    's' 
then 

value_type   =    string; 
else 

/*   If    the      value   type    is   integer,    then  */ 

/*    we    decide   which    hashing    function    to  */ 

/*    use-  */ 

MAX   =    tem_ptr. value_c1 ;    /*    The   possible  */ 

/*    maximum    value        */ 

/*    for   this  */ 

/*    attribute.  */ 

MIN   =    tem_ptr. value_c2;    /*    The   possible  */ 

/*    minimum    value  */ 

/*    for   this  */ 
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/*  attribute-       */ 
if  (MAX-MIN)  <  the_number_of_buckets 
then 

value_type   =   small_integer 
else 
range   =    (MAX-MIN)     /   the_number_of_fcuckets ; 
value_type   =   large_integer ; 
end    if; 
end  if; 
end   if; 

/*  Allocate    a   buffer   to   store   the   hashed  results.    */ 
perform   ALLOCATE_HASH_BUFFER (Hash_buf f er) ; 

/*      Note:    we    may   not   want   to   call  this    */ 
/*  routine    at    this  point-  */ 

switch    (value_type) 
case   string: 

perform    5TRING_HASH (result, 

hash_buf fer)  ; 
case   small_integer : 

perform    SMALI_INTEGER_EASH (result ,    I1IN 

hash_buffer)  ; 
case   large_integer : 

perform    LARGE_INTEGER_HASH (result ,    BIN, 

range , 

hash_buf f er)  ; 
end   switch; 
end    procedure   HASH_FUKC: 
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procedure  FI ND_C OM MO N_ ATTRIBUTE (input:  request  table; 

output:  attribute  name)  ; 
/*************** ****%***%* *%********** **#****#*#*#**** 

*  This  procedure  is  used  to  find  the  name  of  the     * 

*  jcin   attribute.  * 

*  The  join  attribute   is  the  first  attribute  of  the  * 

*  target  list,  sc  we  can  just  go  to  the  entry       * 

*  where  the  target  list  begins  and  extract  the  first* 

*  attribute  name  and  then  return  it  to  the  calling   * 

*  procedure.  * 

end   procedure    FIND_CCMMON_ATTRIBOTE ; 


procedure   ALLOCATE_BUIFER (input:    reguest_id; 

output:hash_buf f er) ; 

/*   This   procedure   is   used    to   allocate    a   buffer   for  */ 
/*    storing    the   records   and    their    hashed   bucket    number/*/ 

/*    set    the    length   of    the    buffer    to    0,    and  then  */ 

/*   return  the  buffer   to    the  calling    procedure.  */ 

/*  */ 

/*    The   data    structures    and   the   variables   used    in  */ 

/*    this   procedure  are:  */ 

/*        1.    hash_buffer:  */ 

/*  A   variable   of   the    data    type    hashing_buf f er,  */ 

/*  which    is   defined    in   hashing_module. def  */ 

/*  (see    Appendix   G)  .  */ 

/*        2.    H3_ptr:  */ 

/*  A   pointer  to   the   hash_buffer.  */ 

/*        3.    HB   id:  */ 
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/*  A   field    name  of    the  hash_buffer   that  */ 

/*  contains    the   request  id    of    the    records  */ 

/*  which   belong    to    this    buffer.  */ 

HE_ptr    =  allocate    the    hash    buffer; 
HE_ptr->HB_id   =    reguest_id; 
HE_ptr->length   =    0; 
end    procedure    ALLOCATE_BUFFER; 


procedure   STBING_HASB  (input:    result    buffer,    h_buf fer)  ; 

/*********************#***#**************#************ 

*  This  procedure  is  called  when  the  value  type  * 

*  of  the  common  attribute  is  a  character  string.  * 

*  It  performs  the   following  tasks:  * 

*  1.  Extract  records  from  the  input  result  buffer  * 

*  one  at  a  time.  * 

*  2.  Extract  tie  value  of  the  join   attribute  * 

*  from  the  extracted  record  and  then  check  the  * 

*  lookup  tarle  to  get  the  bucket  number  for  * 

*  the  record.  * 

*  3.    Store   the   bucket    number  and   the    record   into  * 

*  a   reserved   hash    buffer,    h_buffer-  * 

*  4.    If    the   hash   buffer    is    full,    then      send      the  * 

*  hash    buffer   to   Bucket-block    tracking  * 

*  procedure.  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  attribute_value:  A  character-string  * 

*  representation  of  the  common  * 

*  attribute  value.  * 

*  2.  record:   A  character-string  represent  atior;  * 
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of  the   extracted   record. 

3.  bucket_numteir:    The    bucket    number    where    the 

record    characterized    by    the 
common    attribute    value   is 
hashed    into. 

4.  bucket:       A  character-string    representation 

cf   the   bucket_number. 

5.  EOV:    The    end-of-value   marker. 

6.  SON:  The  erd-of-name  marker. 

7.  EOB:  The  end-of-buf f er  marker. 

8.  LAST_HECOIiE:  A  boolean  variable  to  indicate 

that  this  record  is  the  last 
record  for  the  request. 

9.  i:  The  index  for  the  length  of  the  result 

buffer . 
j:  A  general  purpose  index. 

10.  lookup:  The  lookup  table,  which  is  an  array 

with  2048  character-string  elements. 


abal 


abc 


2047     zyth 
11.  h  buffer 


A  variable  of  type  hash_auffer 
which  is  define!  in 
hashing_module. def  (see  Appendix  G)  * 
and  is  used  to  store  records  and  * 
their  hashed  values.  * 


**t*# ************************************ ***********/ 
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/*  Get  the  lookup  table.  V 

i  -  18 

j  =  0; 

IAST_RECORD    =    false; 

/*   Get    records    frcm   the    result    buffer    one   at    a   time.    */ 

while   result_buf f €r[ i  ]  <>    EOB   do 

/*   Bypass    the    name    of    the   common  attribute.  */ 

while   result_buf fer[i ]   <>   EON    do 

i  =  i+1; 
end   while;    /*    New,    result_buf f er[  i ]    =    EON.  */ 

i  =   i+1; 

/*  Get  the   value  of   the    join  attribute.  */ 

While   result_buffer[i ]    <>   EOV    do 

attribute_value[  j ]   =    result_buf f er[ i  ]; 
i   =    i+  1 ; 

j  =  j+1; 

end   while;    /*   New,    result_buf f er[ i  ]   =    EOV.  */ 

/*  Compare   the    common    attribute  value   with  */ 

/*   the  contents  of    the    lookup    table    to  get    the  */ 

/*   bucket-number.  */ 

bucket_numbers  =    BI_S EARCH (lookup,    attribute_Eumber) ; 
perform   NU1BER_10_STRING (bucket_number,    bucket) ; 
/*   Add  a    EOV    marker   to   the   end   of 

the  attribute    value.    */ 
attribute_value[ j ]  =    EOV 
/*   Extract   records   from   the   buffer.    */ 
i    =   i+1; 
j    =    0; 
repeat 

record[j]  =    result_buf f er[ i ]; 

i   =    i+1 ; 

until  result_buf fer[ i- 1 ]   =    EORecord; 
/*   New,    record£j]   =   EORecord.    */ 
if    result_buf fer[ i  ]  =    EOEequest 
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then 

LAST_RECORL  =  true; 
i  =  i  +  1  ; 
end  if; 
/*  Store  the  hashed  information  into  the 

hash  buffer,  h_buffer.   */ 
perform  PUT_HASH_BUFFER (h_buff er,  bucket, 

attribute_value,  record, 
IAST_RECORD) ; 
end  while; 
end  procedure  STRING_HASH; 


procedure  PUT_EASK_3UFFER (input:  h_buffer, 

bucket 

attribute_value,    record, 
LAST_RECORD; 
output:    h_buffer) ; 

*  This   procedure   is   used    to   store    the    hashed  * 

*  record   information  into   the    hash_buffer.  * 

*  * 

*  Lata  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  X,Y,Z,i/j,K:  General  purpose  indexes.  * 

*  2.  MAX:  The  predefined  maximum  length  of  the  * 

*  hash  buffer.  * 

*  3-    bucket:    A  character-string    representation  * 

*  of   bucket_number.  * 

*  4.    record:    The    input    record    which    is    in    the  * 

*  form    of   character    string.  * 

*  5.    LAST    RECORD:    A    boolean    variable   which    is  * 
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*  used    to    indicate    the   end    of  * 

*  this   request.  * 

*  6.    h_buffer:    A    buffer   which   is    used   to    store        * 

*  records    and   their    hashed    values.       * 
*********#****************************♦*************/ 

/*   Check   to    see    if   the    buffer    has   enough   space    for      */ 
/*   the   new    record.    */ 
X   =    String_len  (bucket_number)  ; 
Y   =    String_len  (attribute_value)  ; 
Z   =    String_len  (record) ; 

K  =  the_current_length_of_the_hash_buf fer; 
if  (K  +  X  +  Y  ♦  Z)  >  MAX 
then 

/*  The   buffer   is      full,    so    it   is   send   to   the      */ 
/*   bucket-blcck   tracking   procedure.  */ 

perform    BUCKET_3L0CK (h_buf f er) ; 
/*   Reset    the   length    of   the    buffer    to   0.    */ 
K    =   0; 
else 

/*    The    buffer    has   enough    space,    so    store    the      */ 
/*  input    record   into   the   buffer.*/ 
for   i   =    1    tc  X   do 
K  =   K   +    1; 

hash_result£  K  ]   =    bucket[i]; 
end  for; 

for  i    =    1    to  Y   do 
K   =    K    +    1; 

hash_result[  K ]  =    attribute_value[ i ]; 
end  for; 

for   i    =    1    to  2    do 
K    =    K    +    1; 

hash_result[  K  ]   =    record[i]; 
end   for; 
/*   If    this    is   the    last   record   of   this   request,       */ 


116 


/*   then  send   the   hash_buffer    to    the  */ 

/*   bucket_blcck   tracking    procedure.  */ 

if    LAST_RECOED 
then 

hash_result[K+1 ]    =   EOReguest; 
hash_result[K+2]   =    EOB; 
perform  BUCKET_BIOCK(h_buf f er) ; 
perform    EREE_EUFEER_SPACE (h_tuf fer) ; 
end  if; 
end   if; 
end; 
end    procedure    POT_HASE_BUFFER; 


procedure    SMALL_INTEGER_HASH (input :    result_buffer, 

MIN, 

h_buf f er ; 
output : h_buf fer)  ; 

*  This  procedure  is  used  when  the  type  of  the  * 

*  common  attribute  value  is  integer  and  when  the  * 

*  difference  of  the  maximum  and  minimum  value  of  * 

*  the  common  attribute  value  is  less  than  the  * 

*  number  of  the  buckets  of  the  hashing  table.  * 

*  It  performs  the  following  tasks:  * 

*  1.  Extract  records  from  the  input  result  buffer  * 

*  one  at  a  time.  * 

*  2.  Extract  the  value  of  the  common  attribute  from* 

*  the   extracted  record  and  then  calculate  * 

*  the  bucket  number.  * 

*  3.  Store  the  bucket  number  and  the  record  into  * 
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*  a   reserved  hash-buffer.  * 

*  Data   structures  and  variables    used    in   this  * 

*  procedure  are:  * 

*  1.    attribute_value:    A   character-string  * 

*  representation   of   the   common  * 

*  attribute   value.  * 

*  2.    record:       A  character-string   representation  * 

*  of   the  extracted    record.  * 

*  3.    bucket_numter :    The    tucket    number   where   the  * 

*  record   characterized   by    the  * 

*  common   attribute    value   is  * 

*  hashed   into.  * 

*  4.    bucket:       A  character-string    representation  * 

*  cf   the   bucket_number.  * 

*  5.    EOV:    The    end-of-value   marker.  * 

*  6.    EON:    The    end-of-name   marker.  * 

*  7.    EOB:    The    end-of-buf f er    marker.  * 

*  8.    1AST_REC0BE:    A    boolean    variable    to   indicate  * 

*  that    this   record    is    the    last  * 

*  record    for    the   request.  * 

*  9.    i:    The   index    for    the    length   of    the   result  * 

*  buffer.  * 

*  j:  A  general  purpose  index.  * 

*  k:  The  index  for  the  length  of  the  attribute_  * 

*  value.  * 

*  10.  temp:  An  integer  representation  of  the  input  * 

*  attribute_value.  * 

*  11.  h_buffer:  An  variable  of  type  hash__buffer  * 

*  which  is  defined  in  * 

*  hashing_module, def  (see  Appendix  G)  * 

*  and  is  used  to  store  records  and  * 

*  their  hashed  values.  * 
***#****************#*******************************/ 

/*  Initialize  the  indexes.  */ 
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i  -    Is 

k    =    1; 

j  =  0; 

LAST_BECORD   =    false; 

/*   Get    the    records  from    the    result   buffer 

one   at   a    time.      */ 
while   result_buf f€r[ i]   <>    EOB    do 

/*   Bypass    the    name   of    the    common   attribute.    */ 
while   result_buffer[i ]   <>   EON    do 

i   =   i+1 ; 
end   while;    /*    New,    result_buf f er[ i  ]    is    EON.    */ 
i    =    i  +  1; 

/*  Get  the    value   of   the   common    attribute.    */ 
while   result_buffer[ i]    <>   EOV    do 

attribute_value[  k ]   =    result_buf f er[ i  ]; 
i   =   i+1 ; 

j  =  J+1; 

end  while;  /*  Now,  resul t_buf f er[  i ]  is  EOV.  */ 

/*  Compute  the  tucket  number.   */ 

terform  STHING_TO_NUMBER  (attribute_value,  Temp)  ; 

bucket_number  =  Temp  -  MIN; 

perform  NUMBEF._'IO_  STRING  (bucket_number/  bucket)  ; 

/*  Add  a  EOV  marker  to  the  end  of  attribute  value.  */ 

attribute_value[ j ]  =  EOV 

/*  Get  the  attribute-value  pairs  of  the  actual   */ 

/*  target  list  of  the  record.  */ 

i  =  i+1; 

j  =  0; 

repeat 

record[ j  ]   =    resul t_buf fer[ i ] ; 

i   =   i+1; 

j  =  j+1; 

until   result_buffer[  i-1 ]   =    EOEecord; 

/*   Now,    record[j]   is    EORecord.       */ 
if   result_buf f ei[ i ]  =    EOBeguest 
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then 

1AST_REC0BE    =    true; 
i  =    i  +  1  ; 
end   if; 

/*   Store    the    hashed    information   into    the    h_buffer.    */ 
perform   PUT_HASH_BUFFER (h_buf f er,    bucket, 

attribute_number,    record, 
1AST_REC0RD) ; 
end    while; 
end    procedure    SnALl_IKTEGER_HASH; 


procedure   LARGE_INTEGEF_HASH (input :    resuit_buf f er, 

AIH,    range, 
h_buf fer ; 
output :hash_buf fer)  ; 

*  This   procedure  is   used   when   the  type   of   the  * 

*  common    attribute   value    is   integer   and    when    the  * 

*  difference   of    the   maximum    and   minimum  value    of  * 

*  the   common   attribute    value   is   greater  than    the  * 

*  number   of    the    tuckets    of    the    hashing    table.  * 

*  It   performs    the  following    tasks:  * 

*  1.    Extract    records   from   the    input   result    buffer  * 

*  one    at   a    time.  * 

*  2.    Extract    the   value    of    the    common   attribute    from* 

*  the      extracted   record   and   then    calculate  * 

*  the   bucket    number.  * 

*  3.    Store   the   tucket    number   and  the   record    into  * 

*  a   reserved   hash-buffer.  * 

*  Data    structures   and   variables    used    in   this  * 

*  procedure   are:  * 

*  1.  attribute_value:  A  character-string  * 

*  representation  of  the  common  * 
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attribute  value.  * 

2.  record:   A  character-string  representation     * 

of  the  extracted  record.  * 

3.  hue ket_ number:  The  tucket  number  where  the    * 

record  characterized  by  the 
common  attribute  value  is 
hashed  into. 

4.  tucket:   A  character-string  representation 

of  the  bucket_number. 

5.  EOV:  The  end-of-value  marker. 

6.  EON:  The  end-of-name  marker. 

7.  EOB:  The  end-of-buf f er  marker. 
6.  LAST_RECORD:  A  boolean  variable  to  indicate 

that  this  record  is  the  last 
record  for  the  request. 

9.  i:  The  index  for  the  length  of  the  result 

buffer, 
j:  A  general  purpose  index. 
k:  The  index  for  the  length  of  the  attribute 

value. 

10.  temp:  An  integer  representation  of  the  input  * 

attribute_value.  * 

11.  h_buffer:  An  variable  of  type  hash_buffer    * 

which  is  defined  in  * 

hashing_module. def  (see  Appendix  G)  * 
and  is  used  to  store  records  and  * 
their  hashed  values.  * 


/*  Initialize  the  indexes.   */ 
i  =  1 
k  =  1 

j  =  0 

1AST_BEC0RD   =    false; 

/*   Get    records    frcm    the    result    buffer    one   at    a    time.    */ 
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while  result_buf f er[ i]  <>   EOB   do 

/*   Bypass    the    name   of   the  common   attribute.    */ 
while  result_buffer[i ]   <>   ECN    do 

i   =  i+1 ; 
end   while;    /*    Now,    result_buf f er[ i  ]    is   EON.    */ 
i   =   i+1; 

/*   Get   the    value   of   the    join    attribute.    */ 
while   result_buff er~ i]    <>   EOV    do 

attribute_value[ k  ]   =    result_buf f er[ i  ]; 

i   =    i+1 ; 

J  =  j*i; 

end   while;    /*    Ncwr    result_buf f er[ i  ]    is   20V.    */ 

/*   Compute   the    tucket    nuirber.       */ 

perform    STRING_TO_NUMBER  (attribute_value,    Temp)  ; 

bucket_value   =    TRUNC[  (Temp   -    MIN)/range]; 

perform   NUMBER_TO_STRING  (bucket_value,    bucket)  ; 

/*   Add   a    EOV    marker   to   the    end  of   attribute_value.    V 

attribute_number[ j  ]  =    EOV 

/*   Get   the  attribute-value   pairs  of    the   actual  */ 

/*    target    list    of   the    record.      */ 

i    =    i+1; 

j    =    0; 

repeat 

record[ j  ]   =    result_buf f er[ i ]; 

i   =    i+1 ; 

j   =    j+1, 
until    result_buffer[  i-1 ]   =    EORecord; 

/*  New,    record[j]  is   EORecord.    */ 

if   result_buf f er[ i ]  =    EOReguest 

then 

LAST_RECORE   =    true; 

i  =    i  +  1 ; 
end   if; 

/*  Store  the  hashed  information  into  the  h_buffer.  */ 
perform  PUT_HASH_BUFFER (h_buf f er,    bucket, 
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attribute_number,    record, 
IAST_RECOSDJ ; 
ecd    while; 
end    procedure   LARGE_1NTEGER_HASH; 
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APPENDIX    E 
THE    EOCKET-BLOCK-TfiACKING    PBOCEDOEE    PROGRAM    SPECIFICATIONS 

procedure    BUCKET_BLOCK (input:    E_tuf f er) ; 

*  This    procedure   receives    a    hash   buffer,    H_buffer,  * 

*  from    the    ret_ccm   subfunction    and    performs    the  * 

*  following    task-  * 

*  1.    Establish   and   maintain      a    global   table    to  * 

*  store    the   addresses    of    the    hashing   tables  * 

*  of    all    the   requests-  * 

*  2.    Extract    the  hashed   record      information   from  * 

*  the    input   hash_buffer.  * 

*  3.    Check   the   global    table   to   see    if    the    input  * 

*  records    belong   to   a  new    reguest.    If   they    do,  * 

*  then   allocate  a    new  hashing    table.  * 

*  Otherwise,    get   the   logical  address  of    the  * 

*  hashing    table   from    the   global    table   and  * 

*  assign   a    pointer   to   the    hashing   table.  * 

*  4.    Group    records  into   the   buckets   according    to  * 

*  their   bucket    numbers    and    store    them    into  * 

*  blocks-  * 

*  5.  Broadcast  the  bucket  information  of  the  local  * 

*  target  records  to  the  other  backends.  * 

*  6.  Store  the  hashing  table  back  to  the  secondary  * 

*  storage-  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  * 

*  1.  FIRST_EET_COM  :  * 

*  A  boolean  variable  which  is  set  to  * 

*  true  when  the  first  retrieve  common  * 
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*  request   enters  the   system.  * 

*  * 

*  2.    GT_ptr:  * 

*  A  pointer  to  a  global  table.  * 

*  3.  G_table:  * 

*  A  variable  of  type  global  table  (see  * 

*  Appendix  G) .  * 

*  * 

*  4.  HT_ptr:  * 

*  A  pointer  to  a  hashing  table.  * 

*  5.  HT:  * 

*  A    variatle    of    type   Hash_table    (see  * 

*  Appendix   G)  .  * 

*  * 

*  6.    HB_ptr:  * 

*  A  pointer  to  a  hash  buffer.  * 

*  7.  H_buffer:  * 

*  A    variable   of    type   hash_buffer    (see  * 

*  Appendix   G)  .  * 

*  * 

*  8.    NEW_REQUES1:  * 

*  A    boolean    variable   which   is    set    to  * 

*  true    if   the    request    id    cannot    be   found  * 

*  in   the   global    table.  * 

*  9.    logical_addr :  * 

*  A    variatle    of    type   addr_def inition,  * 

*  which    is   defined    in   the   commdata.def    file.      * 

*  10-    bucket_number :  * 

*  The   bucket    number   where    the    record  * 

*  characterized    by   the   attribute   value    is  * 

*  hashed    into.  * 

*  11.    bucket:  * 

*  A    character-string    representation    of  * 

*  the   bucket_number.  * 

*  12.    req_id: 
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* 


*  A   record  which  contains  the  traffic  id  and  * 

*  request  number  of  a  request.  * 

*  13.  i,  j:  * 

*  General  purpose    indexes.  * 

if    FIRST_RET_COM 
then 

perform    INITIAIIZE_GLOBAL_TABLE(GT_ptr) ; 

FIRST_RET_COM   =    false; 
end   if; 

/*   Get    the    request   id    from    the   pointer    of   which  */ 

/*   pcints   the   input   hash    buffer.  */ 

request_id   =    H_buff er . Request_id; 

/*   Check   the   global    table   to   see   if  this   request   is        */ 
/*   a  new  request.  */ 

perform   CHECK_G1CEAL_TABLE  (GT_ptr,    req_id, 

logical_addr,    NEW_REQ0ES1)  ; 
if    NEtf_3EQ0E3T 
then 

perform  ALLOCATE_HASH_TABLE (logical_addr) ; 
perform  INSEET_GLOBAL_TABLE  (GT_ptr,  req_id, 

logical_addr) ; 
end  if; 
perform  GST_HASH1NG_TABLE (request_id, 

logical_addr,  HT) ; 

/*  Now,  the  hashing  table  is  ready  to  store  records.  */ 

/*  Extract  the  record  information  from  the  */ 

/*  hash  buffer  ore  record  at  a  time.  */ 

/*  3ecause  the  last  two  character  of  the  hash  buffer  */ 

/*  are  the  EORequest  marker  which  indicates  whether  */ 

/*  this  is  the  last  hash  buffer  for  this  request  */ 

/*  and  the  EOBuffer  marker  which  indicates  the  */ 

/*  end  of  this  hash  buffer,  the  actual  length  of  the  */ 

126 


/*   hash   buffer    is  length-2.  */ 

j  =  i; 

while   j   <    (H_buffer. length-2)    do 
/*   Get   the   bucket    number.    */ 
i    =   0; 
repeat 

tucket[i]   =    H_buffer.Hashed_result[  j]; 

i   =    i    +1 ; 

j  =  j  +    1; 

until   H_buff er.Hashed_result[ j ]  =    EOV; 

/*   Convert    the    tucket    number    from    a    character    to      */ 

/*   an   integer.    */ 

tucket_number  =  STRING_T0_INTSG2R  (bucket)  ; 

/*  Get  the  common  attribute  value  and  the  record    */ 

/*  itself.  */ 

j  =  j  +  i; 

i  =  0; 
repeat 

ccmmon_and_record[i  ]  =  Hash_buf f er .HB_buf f er[ j  ] ; 

i  =  i  +  1 ; 

j  =  j  ♦  is 

until   common_and_record    [i   -    1]   =    EORecord ; 

/*   Store   the    record   and   its   common    attribute   value   */ 
/*    into    the    hashing   table.         */ 

perform    STOEE_RECORD_IN_HASH_TABLS  (HI,    bucket_numler, 

common_and_r€Ccrd, 


NEW_RECORD)  ; 


NEW_RECORD  =  false; 
end  while; 


/*  Check  if  this  is  target  request  */ 

if  MCD  (req_id.  reguest_no,  2)  =  0 
then 

/*   This  is  a  target  reguest.  */ 


127 


perform    BRO AECAST_TARGET_INFO (HT)  ; 
end    if; 

perform    STORE_BACK (HT,    logical_addr) 
end    procedure    BDCKET_ELOCK; 


procedure    INITIALIZE_GLOBAL_TABLE  (output:    GT_ptr)  ; 

*  Ihis   procedure    is   used    when    the    first  retrieve-  * 

*  ccmmon    request   is   executed   in   the    BDCKET_3L0CK  * 

*  procedure.  * 

*  This  procedure  creates   a   global   table  and  * 

*  returns    the    pointer     (GT_ptr)       to   the   table   to  * 

*  the   calling    procedure-  * 

end    procedure    INITIA1IZS_GL0BAI_TABLE; 


procedure   ALLOCATE_HA£H_TABLE (output :    logical_addr) ; 

*  This   procedure    is   used    to  allocate   a   hashing  * 

*  table   for  a    new  retrieve-common    request    from  * 

*  a  predefined    secondary    storage   area   and   return  * 

*  the   logical    disk    address   to    the   calling  * 

*  procedure.  * 

*  The  bucket   entries   are    also   initialized.  * 

end    procedure    ALLOCATI_HASH_TAELE; 
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procedure   CHECK_GL0BAI_TA3LE (input:    GT_ptr,    request_id; 

output:    logical_addr,    NEK_REQOEST)  ; 
/****************$*$* **************************$******* 

*  This    procedure   is   used    to   check    whether    a    request  * 

*  is    a  new   request      by   checking   its    request    id  * 

*  against    the    global   table.    If    the    request   id   is  * 

*  found    in    the    global    table,    then    set   the    value    of  * 

*  NIw_EEQOEST    to   false   and   return    the    logical    disk  * 

*  address      of      the      hashing      table      to   the    calling  * 

*  procedure.       Otherwise,    return   the    NEW_REQOEST  * 

*  tack  to   the   calling    procedure.  * 
*****************************************************/ 

end    procedure   CHECK_GIOBAL_TABLE; 


procedure    INSERT_GLOEAL_TABLE (input :    GT_ptr,    Req_idf 

logical_addr ; 

output:    GT_ptr) ; 
/****************************************************** 

*  This  procedure  is  used  to  insert  a   new  hashing  * 

*  table  into  the  global  table.  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  GT_ptr:  * 

*  A  pointer  to  the  global  table.  * 

*  2.  Req_id:  * 

*  The  request  id  of  the  records  of  the  new  * 

*  hashing  table.  * 

*  3.  logical_addr :  * 

*  The  logical  disk  address  of  the  new  hashing  * 

*  table.  * 
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*  * 

*  An   inverted  list  implementation  to  maintain  the   * 

*  table  is  reccffmanded.  * 
************************************* ************# ** */ 

end  procedure  IN5ERT_GL0BAL_TABLE; 


procedure  £T0R2_REC0RI_IN_HA5H_TABLE 

(input:  HT,  buck.et_number , 

info,  NEW_RECORD) ; 
y***  *********************************************** -^^jf* 

*  This  procedure  is  used  to  store  the  common  * 

*  attribute  value  of  a  record  and  the  record  itself  * 

*  into  a  hashing  table.  * 

*  Recall  that  the  records  are  stored  in  blocks.  * 

*  * 

*  Lata  structures  and  the  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  HT:  * 

*  A    variable    of    type   hash_table    which    is  * 

*  defined    in      hashing_module. def     (see    Appendix  * 

*  G).  * 

*  2.    bucket_number:  * 

*  The    bucket    number    where   the    record  * 

*  characterized   by   the  common  attribute    value  * 

*  is    hashed   into.  * 

*  3.    info:  * 

*  A    character   string   which  contains   the  * 

*  common    attribute    value    of   a    record    and    the  * 

*  record    itself.  * 

*  4.    NEW_RECORD:  * 

*  A  boolean  variable  to  indicate  whether  the  * 
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*  input    info   is   a   new    record    of    this    request  * 

*  id.  * 

*  5.    old_bucket_number :  * 

*  The   bucket_number   of   the  previous   input  * 

*  record.  * 

*  6.    bkt:  * 

*  A  variable  of  type  BUCKET_ENTBY  which  is  * 

*  defined  in  hashing_module. def  (see  Appendix   * 

*  G)  .  * 

*  7.  blk_ptr:  * 

*  A    pointer   to  a    record  block   of    type  * 

*  REC_BLOCK   which    is    defined    in  * 

*  hashing_module.  def    (see    Appendix  G)  .  * 

*  8.    blk,    blk_2:  * 

*  Variables  of  type  REC_BLOCK  which  is  defined  * 

*  hashing_module. def  (see  appendix  G)  .  * 

*  9.  I:  * 

*  An    integer   variable.  * 

*  10.    MAX_BLCCK_SIZE:  * 

*  An    integer    that    represent    the    maximum  * 

*  length    cf    the    block  content.  * 

if    NEW_RECORD 
then 

/*   This   record    is   the    first    input    record    of    this    */ 

/*    request.  */ 
perform   GET_THE_BUCKET (HT,    bucket_number ,    bkt); 
perform    ALLCCATE_REC_ELOCK (blk) ; 
perform   MODIEY_ENTRY_5_HEADER (bkt,    blk); 
else 

/*   Compare    the    input   bucket_number    with    the 

previous    cne.       */ 
if    bucket_number    <>    old_bucket_number 
then 
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perform   STORE_BACK  (blk)  ; 

/*  Get    the   desired   bucket   entry      for    this 

input   record.    */ 
bkt   =    HI. bkt_entries[  bucket_number  ]; 
/*    Check   if   the    tucket    is   empty.    */ 
if   bkt. status   =    empty 
then 

perform    ALL0CATE_EEC_BLOCK  (blk ,    addr)  ; 
perform    MODIFY_ENTRY_S_HEADEE  (bkt, 

blk , addr) ; 
else 

/*  Get  the  record  block  by  the  address  */ 
/*  in  the  bucket  entry.*/ 
perform  GET_REC_BLOCK  (bkt . block_address, 

blk); 
end  if; 
end  if; 

/*  Check  if  the  block  has  enough  space  to   */ 
/*  store  this  record.  */ 
I  =  STRING_IENGTH(info) ; 

if  (blk. header. length  ♦  I)  >  MAX_B1K_SIZE 
then 

/*  This  block  does  not  have  enough  space  */ 

/*  for  this  record.  */ 

perform  ALLOCATE_RECORD_BLOCK  (blk_2, 

addr_2)  ; 
perfori  MODIFY_ENTRY_S_HEADER (bkt, 

blk_2, 
addr_2) ; 
/*  This  routine  will  also  modify  */ 
/*  the  header  of  blk_2.  */ 
perform  STORE_BACK (blk) ; 
blk  =  blk_2; 
end  if; 
end  if; 
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perform    STORE_INEC_IN_B LOCK  (info,    blk)  ; 
end    procedure    ST0RE_EECORD_IN_HASH_TABLE; 


procedure    STORE_BACK  (input:    A_structure)  ; 

*  This   procedure   is   used   to   store   a    hashing    table,  * 

*  or   a   record    block    back    to    the    secondary   storage.  * 

*  * 

*  A_structure   is  a  variable  which  may  be  either  * 

*  a  hashing  table  or  a  block.  * 

end  procedure  STORE_EACK; 


procedure  GET_REC_BLOCK  (input:  logical_addr; 

output :  blk)  ; 

*  This  procedure  is  used  to  bring  a  block  of  memory  * 

*  from  a  predefined  secondary  storage  area  into  the  * 

*  primary  memory  by  its  logical  address.  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.    logical_addr  * 

*  The    logical   address   of    a    block.  * 

*  A   variable   of  addr_def inition    which    is  * 

*  defined    in    the   commdata. def    file.  * 

*  2.    blk  * 

*  A   variable    of   type    REC_BLOCK    which   is    defined* 

*  in      the    hashing_module. def    (see    Appendix    G) .  * 
****************************************************/ 

end    proc€dure   GET_REC_BLOCK; 
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procedure  STORE_INF0_IN_BLOCK (input :  info,  blk) ; 

*  This  procedure  is  used  to  store  the  common  * 

*  attribute  value  of  a  record  and  the  record  * 

*  itself  into  a  block.  * 

*  It  is  called  only  when  the  block  has  enough  * 

*  space  for  that  information,  i.e.,  info.  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  info:  * 

*  A  character  string  which  contains  the  * 

*  common  attribute  value  of  a  record  and  * 

*  the   record  itself.  * 

*  2.  blk:  * 

*  A  variable  of  type  REC_BLOCK  which  is  * 

*  defined  in  hashing_module.def  (see  * 

*  Appendix  G)  .  * 

*  3.  i,j:  * 

*  General    purpose    indexes.  * 
***************************************************/ 

i    =    0; 

j  =  tlk. header. length+1 ; 

repeat 

blk.contents[  j ]  =    info[i]; 

i   =    i+1; 

j   =    j+1; 

until    i   =    STRING_IENGTH(info) ; 

end    procedure    STORE_INFO_IN_BLCCK; 
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procedure  MODIFY_ENTBT_&_HEADEE (input:  bkt,  blk, 

blk_addr ; 
output:  bkt,  blk) ; 

*  This  procedure  is  used  to  modify  the  bucket  * 

*  entry  of  the  input   bkt   and  the  header  part  * 

*  cf  the  input   blk.   It  will  then  return  these  * 

*  modified   bkt  and   blk   back  to  the  calling  * 

*  procedure.  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure:  * 

*  1.  bkt:  * 

*  A    variable   of   type   3ucket_entry  * 

*  which    is   defined    in    hashing_module. def  * 

*  (see   Appendix   G) .  * 

*  2.    blk:  * 

*  A    variable   of      type   RSC_3L0CK    which  * 

*  is    defined   in   hashing_module. def  * 

*  (see   Appendix   G) .  * 

*  3.    blk_addr  * 

*  A    variable   of   type      addr_def inition  * 

*  which    is  the   logical   address    of    a    block  * 

*  and   is    defined    in    the   commdata. def    file.  * 
******* ************##*************##********#******/ 

blk. header. next_blk_addr   =    bkt.  block_address ; 
bkt.tlock_address  =   blk_addr; 
end    procedure    HODIFY_INTEY_S_EEADER ; 
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procedure  BROADCAST_TARGET_IUFO (input;  HT) ; 

*  This  procedure  is  used  to  broadcast  the  records  * 

*  of  the  target  hashing  table  to  the  other  * 

*  tackends.  * 

*  This  is  the  same  procedure  that   is  used  to  * 

*  broadcast  the  descriptor  ids  among  backends.  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.  HT:  * 

*  A  varxable   of    type   hashing_table  * 

*  which    is    defined    in    hashing_module. def  * 

*  (see    Appendix    G)  .  * 

*  2.    i:  * 

*  A  general   purpose  index.  * 

*  3.    MAX_BKT_#:  * 

*  An  integer  which  is  used  to  represent  the  * 

*  maximum  number  of  the  bucket  entries  in  a  * 

*  hashing  tatle.  * 

*  4.  bkt:  * 

*  A   variable  of      type      Bucket_entry      which  * 

*  is  defined  in  hashing_module. def  (see  * 

*  Appendix  G) .  * 

*  5.  msg:  * 

*  A   character  string   which  is  used  to  store  * 

*  the  message  that  is  to  be  broadcasted  to  all  * 

*  of  the   backends.  * 

for  i  =  1  to  MAX_EKT_#  do 

bkt  =  HT. bkt_entries[i ]; 
if  bkt. status  <>  empty 
then 

/*  Put  the  bucket  number  into   the  message.*/ 
perform  GET_P.EC_BIOCK  (bkt.  block_address,  tlk)  ; 
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repeat 

/*  Extract  the  contents  of  the   */ 
/*  blk. content  and  copy  them  into  msg.*/ 
if  the  msg  is  full 
then 

send  msg  to  all  of  the  backends; 
reset  the  length  of  msg  to  0; 
end  if; 
if  blk.next_blk_address  =  blk. own_address 
then 

/*  This  block  is  the  last  block  for 

this  bucket.  */ 
last  =  true; 
until  last; 
end  if; 
end  for; 

send  the  msg  to  all  of  the  other  backends; 
end  procedure  BROADC A£T_TARGET_INFO ; 
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APPENDIX  P 

THE  MERGING  PfiOCEDORE  PROGRAM  SPECIFICATIONS 

procedure  MERGE (input:  source_reguest_id, 

logic al_address_o£_source_ table, 
logi cal_a ddr ess_of_ tar  get _t able)  ; 

*  * 

*  This  procedure  is  used  to  perform  the  merging  * 

*  * 

*  operation  over  the  source  records  and  the  target  * 

*  * 

*  records.  * 

*  * 

*  Notice  that  the  input  addresses  are  the  logical  * 

*  * 

*  disk  addresses  of  the    two  hashing  tables.  * 

*  * 

*  Data  structures:  and  variables  used  in  this  * 

*  * 

*  procedure  are:  * 

*  * 

*  1.  logical  address  of  source  table,  * 

*  * 

*  logical  address  of  target  table:  * 

*  ^     _       _   _    ^   _  + 

*  The   logical   disk   addresses  of  the  source  * 

*  * 

*  and  the  target  hashing  tables,  both  of  the  type* 

*  * 

*  address  definition  which  is  defined  in  the  * 

*  * 

*  commdata. def  file.  * 

*  * 

*  2.  source  table,  target  table:  * 

*  -             _  * 

*  Variables  cf  hashing  table  data  type  (see  * 

*  -  * 

*  Appendix  G)  that  represents  the  source-hashing  * 

*  * 

*  table   and    the    target-hashing    table.  * 

*                                    3               3  # 

*  3.  i:  A  general  purpose  index.  * 

*  4.  max  bucket  number:  * 

*  -      -  * 

*  The  largest  bucket  number  of  a  hashing  tatle.  * 

*  * 

/*  Retrieve  the    two  hashing  tables  by  the  input  */ 

/*  logical  addresses.  */ 

/*  Ncte:  Due  to  the  limited  memory  space,  we  may  */ 

/*       not  be  able  to  bring  in  the  entire  table.  */ 
perform  GET_HASH_1ABLE (logical_address_of_source_table, 
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source_table)  ; 

perform   GET_HASH_TABLE {logical_address_of_target_tatle, 

target_table) ; 
/*    Reserve    a    result    buffer-    */ 

perform   GET_B'JF?EE  (result_tuff er,source_reguest_id)  ; 
/*   This    routine   will   allocate   an    instance    of    a 
result    buffer    and    put   the   request   id   into    the 
the   header    cf    the    buffer   and    initialize    the 
length   of    the    buffer    to   0. 
This    routine   has   already   been    coded   in 
the   retp.c   file-    */ 
i   =    0; 
while    i   <    max_bucket_number    do 

if   [  (source_ta tie. bucket_entry[i ]. status   <>    empty) 
and 

(target_ta tie. bucket_entry[  i].  status  <>    empty)  ] 
then 

/*   There    is   a   collision.    */ 

/*   Retrieve   the    records    from   both   blocks    and 

perform  the   merging  operation.    */ 
X  =    source_table. bucket_entry[ i  ]. logical_address ; 
Y   =    target_table. bucket_entry[  i  ]. logical_address; 
perform    merging_operation  (X, Y, result_buf f er)  ; 
/*    This   routine    will   perform    the   merging 
operation    and    send   the    merged  results 
to   the  controller.    */ 
end   if; 
i   =   i+1; 
end    while; 

/*    Signal     PP    upon   the   completion   of    the   source    and   */ 
/*   target    reguest-    */ 
end    procedure    MERGE; 


139 


procedure  MERGING_0?IRATION 

(input:  logicl_address_source_block, 
logicl_address_target_block , 
result_buf f er ; 
output:  result_buf f er) ; 

*  This  procedure  is  used  to  perform  the  following  * 

*  tasks:  * 

*  1.  Extract  the  records  from  both  of  the  source  * 

*  block  and  the  target  block.  * 

*  2.  Compare  the  common  attribute  values  * 

*  of  the  source  and  target  records.  * 

*  If  they  are  equal,  then  perform  the  merging  * 

*  operation.  * 

*  3.  Put  the  merged  results  into  a  result  buffer.  * 

*  If  the  buffer  is  full,  then  send  the  buffer  * 

*  to  the  controller  and  reinitialize  the  * 

*  buffer  length  to  0  so  that  the  buffer  can  * 

*  be  reused.  * 

*  Otherwise,  return  the  logical  address  of  the  * 

*  the  result  buffer  to  the  calling  procedure.  * 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  * 

*  1.    source_block,    tar get_block:  * 

*  Variables   of   the    data    type      BKT_BLK      which  * 

*  are  used  to   represent   the  blocks  of    the  * 

*  source    hashing    table   or    the   target    hashing  * 

*  table.  * 

*  3KT_BLK      is    defined   in    hashing_module. def  * 

*  {see   Appendix  G)  .  * 

*  2.    source_dcne,    target_done:  * 

*  Boolean    variables    which    are   used   to    indicate  * 

*  the   completion    of    processing    either    source  * 
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*  records   cr   target    records.  * 

*  3.    i,j:    General  purpose   indexes.  * 
*********** **********************#***$***************/ 

/*  Continue   retrieving      the  source  blocks   by    the  */ 

/*    logical    address,    until    there    are   no    more    blocks.    */ 
repeat 

source_block   = 

GET_ELOCK ( logical_address_source_block) ; 
/*  Continue   retrieving  the   target   blocks   by    the      */ 
/*   logical  address    until    there   are   no   more    blocks.*/ 
repeat 

target_block   = 

GET_ELOCK (logical_address_target_block) ; 
i    =    0; 
while   source_fclock.  body[ i ]   <>    EOB    do 

/*   Retrieve   one   common   attribute_value    and   one    */ 
/*   record    from   source    block.  */ 

source_value  =   GET_VALUE (source_block. body , i    ); 
source_record    =   GET_RECORD  (source_block. body , i)  ; 
J   =    0; 
while   target_block.bcdy[ j ]  <>    EOB   do 

/*    Retrieve  one    common    attribute_value   and      */ 
/*    one    record    from    the    target    block.  */ 

target_value   =   GET_VALUE  (target_block. body ,    j) ; 
target_record  = 

GET_RECORD (target_block. body , j)  ; 
if    source_value    =   target_value 
then 

/*    Append  target   record    at    the    end   of      */ 
/*    source   record  and   put    the   newly  */ 

/*    merged   record   into    the    result    buffer.*/ 
result   =    APPEND (source_r ecord, 

target_record)  ; 
result_length   =    STRING_LENGTH (result) ; 
perform   RB$POT_SEND (result_buf f er , 
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result, 

result_length)  ; 
else 

/*    Go  to   the   next   target    record.    */ 
J   =    J+1; 
end   if; 
end   while;    /*    End    the   target-record    loop.    */ 
i    =   1+1; 
end   while;    /*  End   the   source-record    loop.*/ 

/*   Are    the   target   records   done?    */ 
if    target_block. header . next_block_address   = 
target_blcck. header . this_bloc k_ad dress 
then 

target_dcne  =    true; 
else 

target_tlock. header. next_block_address   = 
target_block.  header. this_block_address; 
end  if; 
until  target_dcre; 

/*   Are  the   source   records    done?    */ 
if   source_block. header. next_block_address   = 
so urce_block. header . this_block_ad dress 
then 

source_done  =  true; 
else 

source_blcck. header . next_block_address  = 
source_block. header . this_biock_ ad dress; 
end  if; 
until  source_done; 
end  procedure  MERGING_OPEEATION; 
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APPENDIX    G 
TEE    HASHING    MODULE    DATA    STRUCTURE    DEFINITIOHS 


Reguest_id 

i 

Length 

Hashed_results 

In  this  appendix  we  present  the  definitions  of  the  data 
structures  used  in  the  previous  appendices.  We  refer  to  the 
definitions   as   hashing_module. def . 

1.    hash_tuffer: 

This    is    the    buffer   which    stores    the   hashed   information 
of    records. 

— >       The    request    id    of 

the    hashed    records. 
— >      The    current    length 

of    the    Hashed_results. 
-->       An   array   of    character 
string    used    for 
storing    the    hashed 
records. 
The    format      of    the    hashed_cesults   is: 
{hashed_record_inf c} +    EOReg    EOB 
where 

hashed_record_inf o    ::    =   bucket_number    EOV    {Rec}+ 
Rec    ::    =    {attribute_value_pair}  +EORec 
attribute_value_pair    : :    = 

attribute_name  EON   attribute_value   EOV 
»♦"    means   one    cr    more    occurence. 
ECB    :    A   special  character   which   is    used  as   a   marker 

for    the    end-of- buffer. 
ECV    :    A    special  character    which    is    used   as    a    marker 

for    the    €nd-of- value. 
ECN    :    A    special    character    which    is    used   as    a    marker 

for    the    end-of-attribute_name . 
ECRec:    A   special   character    which    is    used    as    a    marker 
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for  the  end-of- record. 
ECReq:  A   character,  either   1   or   0  ,  which   s 
use  to  indicate  the  end  of  a  request. 
1:  end  cf  a  request. 
0:  not  end  of  request,  more  buffers  are  coming, 

2.  REC_EICCK 

Blocks  used  by  buckets  to  store  the  records  and  their 

common  attribute  values. 

A  REC_E10CK  is  composed  of  a  header  two  fields, 

and  a  contents. 


header 


contents 


— >  This  part  contains  the  status 

of  this  block. 
— >  This  part  contains  the  records 

and  their  common  attribute  values. 


The  format  of  the  content   of  the  REC_BLOCK  is: 
{Rec}  +  E0B 


The  header  contains  two  parts 
length 


— >   An  integer  to  indicate  the  total 
length  of  the  records  in  this 
block. 

— >   The  logical  address  of  the  next 
block  of  the  same  bucket.  (If 
this  block  is  the  first  block  of 
the  bucket,  then  a  null  aidress 
will  be  put  in  here.) 
The  type  of  this  field  is 
addr ess_def inition  and  is 
defined  in  the  commdata. def  file. 
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3.    Bucket_entry 


status 


hlock    address 


— >    A   character    which    is    either      1      fcr 

not    empty      or   a      0      for         empty    . 
— >   The    logical    address   of  the    block 
of   this   bucket. 


4.    Hash_tabie:    An   array    of    2048    bucket_entries. 
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